-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Ensure Substrait consumer can handle expressions in VirtualTable #16857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor comments, but overall this looks good to me. Thanks for fixing this ✨
} | ||
exprs.push(row_exprs); | ||
} | ||
exprs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does introduce a bit of duplication with the block below, but given that consume_expression
is async and from_substrait_literal
is not it would be a bit tricky to unify the processing. In the long-term Substrait drop support for the values
field which is deprecated so we'll be able to remove the lower block entirely.
return Ok(LogicalPlan::EmptyRelation(EmptyRelation { | ||
produce_one_row: false, | ||
schema: DFSchemaRef::new(substrait_schema), | ||
})); | ||
} | ||
|
||
let values = vt | ||
let values = if vt.values.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: I would suggest checking !vt.expressions.is_empty()
to make it clearer that expressions
are preferred over values. Effectively, it's the difference between "We are processing expressions because there are no values" as opposed to "We are processing expressions because they are present".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made this change!
@@ -0,0 +1,94 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: I would suggest we update the name to something like virtual_table_with_expressions.substrait.json
which more accurately reflects what this plan is intended to verify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made this change!
} | ||
}, | ||
"input": { | ||
"read": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: I would suggest of getting rid of everything other than the read
in this plan, to make it clearer that this is a test of virtual table functionality. The aggregation in this plan is effectively noise.
} | ||
} | ||
] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make this a slighly more robust test, I would suggest:
- Using a struct with more than 1 field to verify the name processing is applied correctly.
- Having more than 1 row to check that multiple rows can be processed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made this change!
Which issue does this PR close?
Rationale for this change
Virtual tables in Substrait can have either literal values or expressions, and currently, the Substrait consumer only handles the case when values are provided (values is a deprecated field). If there are no values provided, an empty relation is returned, even if the virtual table has expressions specified.
If expressions are provided as elements in a virtual table (rather than literal values), from_read_rel() should return a LogicalPlan with the virtual table elements rather than an empty relation.
What changes are included in this PR?
Are these changes tested?
Yes, a test was added that takes a Substrait plan which has expressions as elements in a virtual table, and checks that the resulting plan is correct (and does not have an empty relation).
Are there any user-facing changes?
These changes will allow users to specify expressions in Substrait virtual tables, and get a result from their query.