Skip to content

Support Arquero tables #332

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Dec 2, 2022
Merged

Conversation

libbey-observable
Copy link
Contributor

Resolves #330

  • Identifies Arquero tables by presence of a toArrow function
  • Uses Arrow's .tableFromIPC() with Arquero's .toArrowBuffer() as source for a new DuckDBClient

@mbostock
Copy link
Member

It would be nice to add this directly to DuckDBClient so that you can use Arquero tables with DuckDBClient.of in addition to data table/SQL cells.

src/table.js Outdated
@@ -141,6 +141,11 @@ function isTypedArray(value) {
);
}

export function isArqueroTable(value) {
// Arquero tables have a `toArrow` function
return typeof value.toArrow === "function";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will crash if value is null or undefined, rather than returning false as expected. We should add value && here like we do for isArrowTable.

More importantly, though, our implementation doesn’t use value.toArrow—we use value.toArrowBuffer. And if we’re going to use value.toArrowBuffer, we should check that value.toArrowBuffer exists here rather than checking value.toArrow. Arguably, we could check both if we think there’s a risk of some other object implementing a value.toArrowBuffer method that does something different (i.e., that doesn’t return an Arrow IPC array buffer).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @mbostock! I should have caught all of those. Updated in 705af99 and 7a84daf.

Comment on lines +222 to +224
const arrow = await loadArrow();
const table = arrow.tableFromIPC(source.toArrowBuffer());
return await insertArrowTable(database, name, table);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked how Arquero is implemented, and source.toArrowBuffer calls source.toArrow internally.

https://github.com/uwdata/arquero/blob/1c8f438df033026535e1c4f67b4f891fa6f8f29e/src/table/column-table.js#L306-L308

https://github.com/uwdata/arquero/blob/1c8f438df033026535e1c4f67b4f891fa6f8f29e/src/format/to-arrow.js#L6-L8

So, this code is currently converting an Arquero Table to an Arrow Table to an Array Buffer then back again to an Arrow Table. You can skip two conversions by calling source.toArrow directly (and as we discussed earlier in this review, that means we should now check for the presence of source.toArrow instead of source.toArrowBuffer).

Suggested change
const arrow = await loadArrow();
const table = arrow.tableFromIPC(source.toArrowBuffer());
return await insertArrowTable(database, name, table);
return await insertArrowTable(database, name, source.toArrow());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look, Mike! I started out trying to use source.toArrow rather than source.toArrowBuffer, but ran into errors that way (see screenshot), and the result of requestDatabaseTables for the cell when constructed that way is an empty array.
Screen Shot 2022-12-01 at 7 20 13 PM

Copy link
Member

@mbostock mbostock Dec 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What version of Arquero are you using?

Our DuckDBClient depends on Apache Arrow version 9 (or maybe 8?) or later, which means you’ll need to use Arquero version 5 or later. The version of Arquero in the Observable Standard library is currently 4.8.8 which uses Apache Arrow ^3.0.0.

So, you’ll need to load a more recent version of Arquero for this to work, and more generally, we need to implement standard library versioning so that we can upgrade Arquero to a more recent version in new notebooks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using Arquero from our stdlib; my opinion is that this should work with the Arquero in our stdlib. I added a comment in the code about upgrading Arquero when we have standard library versioning, and I suggest we update the insertArqueroTable function then. (We may also want the current implementation around for older notebooks.)

Copy link
Member

@mbostock mbostock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. 👍

I hope we can get to stdlib versioning soon!

@libbey-observable libbey-observable merged commit 51a61fc into main Dec 2, 2022
@libbey-observable libbey-observable deleted the libbey/arquero-table-support branch December 2, 2022 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Arquero table support (stdlib)
2 participants