Skip to content

Ensure Substrait consumer can handle expressions in VirtualTable #16857

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

lorenarosati
Copy link

@lorenarosati lorenarosati commented Jul 22, 2025

Which issue does this PR close?

Rationale for this change

Virtual tables in Substrait can have either literal values or expressions, and currently, the Substrait consumer only handles the case when values are provided (values is a deprecated field). If there are no values provided, an empty relation is returned, even if the virtual table has expressions specified.

If expressions are provided as elements in a virtual table (rather than literal values), from_read_rel() should return a LogicalPlan with the virtual table elements rather than an empty relation.

What changes are included in this PR?

  • Only returns an empty relation if there are no values and no expressions specified.
  • If expressions is non-empty, we iterate through the records and consume each expression; if expressions is empty, we fallback to iterate through the records and consume each literal.

Are these changes tested?

Yes, a test was added that takes a Substrait plan which has expressions as elements in a virtual table, and checks that the resulting plan is correct (and does not have an empty relation).

Are there any user-facing changes?

These changes will allow users to specify expressions in Substrait virtual tables, and get a result from their query.

@github-actions github-actions bot added the substrait Changes to the substrait crate label Jul 22, 2025
@lorenarosati lorenarosati marked this pull request as draft July 22, 2025 19:16
@lorenarosati lorenarosati marked this pull request as ready for review July 22, 2025 20:28
Copy link
Contributor

@vbarua vbarua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments, but overall this looks good to me. Thanks for fixing this ✨

}
exprs.push(row_exprs);
}
exprs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does introduce a bit of duplication with the block below, but given that consume_expression is async and from_substrait_literal is not it would be a bit tricky to unify the processing. In the long-term Substrait drop support for the values field which is deprecated so we'll be able to remove the lower block entirely.

return Ok(LogicalPlan::EmptyRelation(EmptyRelation {
produce_one_row: false,
schema: DFSchemaRef::new(substrait_schema),
}));
}

let values = vt
let values = if vt.values.is_empty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I would suggest checking !vt.expressions.is_empty() to make it clearer that expressions are preferred over values. Effectively, it's the difference between "We are processing expressions because there are no values" as opposed to "We are processing expressions because they are present".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this change!

@@ -0,0 +1,94 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I would suggest we update the name to something like virtual_table_with_expressions.substrait.json which more accurately reflects what this plan is intended to verify.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this change!

}
},
"input": {
"read": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I would suggest of getting rid of everything other than the read in this plan, to make it clearer that this is a test of virtual table functionality. The aggregation in this plan is effectively noise.

}
}
]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this a slighly more robust test, I would suggest:

  • Using a struct with more than 1 field to verify the name processing is applied correctly.
  • Having more than 1 row to check that multiple rows can be processed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this change!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
substrait Changes to the substrait crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ensure Substrait consumer can handle expressions in VirtualTable
2 participants