Skip to content

Derived columns #367

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
May 11, 2023
Merged

Derived columns #367

merged 17 commits into from
May 11, 2023

Conversation

annie
Copy link
Contributor

@annie annie commented Apr 27, 2023

Part of https://github.com/observablehq/observablehq/issues/11623

Description

This PR adds support for derived columns in the __table function. In this first version, we won't support derived columns for database sources.

Notable changes:

  • Pulled the logic for inferring types and coercing rows into a separate applyTypes function. We now run this twice, first on the source dataset, and second on the derived dataset, before merging the two. Type inference/coercion must be done separately on the derived dataset because it may depend on values in the source dataset already being coerced.
  • Added a .fullSchema property to the return value of __table. This property contains the schema information for all columns in the dataset, regardless of whether or not they are selected (.schema only contains the schema info for selected columns). In https://github.com/observablehq/observablehq/pull/11214, we switch to using the cell value to get the table schema, because derived columns aren't available on the original dataset and their types may be dynamic, so we need to look at their evaluated runtime values. We look at .fullSchema when fetching the table schema so that we always have type information for the full set of columns, so that users can e.g. reselect a deselected column in the Columns menu.

Review notes

I would love some feedback on the .fullSchema change! I'm not sure if it's the best way to make all the column types available, and I'm also not sure if I'm missing any major pitfalls with switching to use the cell value to fetch the table schema, instead of looking at the original data source as we do today. The main pitfall I experienced when testing is that, if there's an error in a derived formula, we no longer have a table schema available because the cell throws an error. I addressed that in the monorepo PR by adding a fallback that goes back to using the loaded data source + an approximation of the derived columns schema, which I think is the best we can do in that case. But perhaps there are other issues with this approach that I'm missing...

@annie
Copy link
Contributor Author

annie commented May 10, 2023

just pushed up a new commit that catches runtime errors and returns them in the function output (although it seems like i need to update tests!). will put up a corresponding monorepo PR shortly!

update: monorepo PR here

@annie annie requested a review from libbey-observable May 11, 2023 17:28
@annie annie merged commit 63bce4e into main May 11, 2023
@annie annie deleted the mkfreeman/derive-columns branch May 11, 2023 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants