-
Notifications
You must be signed in to change notification settings - Fork 84
Support Arquero tables #332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It would be nice to add this directly to DuckDBClient so that you can use Arquero tables with DuckDBClient.of in addition to data table/SQL cells. |
src/table.js
Outdated
@@ -141,6 +141,11 @@ function isTypedArray(value) { | |||
); | |||
} | |||
|
|||
export function isArqueroTable(value) { | |||
// Arquero tables have a `toArrow` function | |||
return typeof value.toArrow === "function"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will crash if value is null or undefined, rather than returning false as expected. We should add value &&
here like we do for isArrowTable.
More importantly, though, our implementation doesn’t use value.toArrow—we use value.toArrowBuffer. And if we’re going to use value.toArrowBuffer, we should check that value.toArrowBuffer exists here rather than checking value.toArrow. Arguably, we could check both if we think there’s a risk of some other object implementing a value.toArrowBuffer method that does something different (i.e., that doesn’t return an Arrow IPC array buffer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const arrow = await loadArrow(); | ||
const table = arrow.tableFromIPC(source.toArrowBuffer()); | ||
return await insertArrowTable(database, name, table); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked how Arquero is implemented, and source.toArrowBuffer calls source.toArrow internally.
So, this code is currently converting an Arquero Table to an Arrow Table to an Array Buffer then back again to an Arrow Table. You can skip two conversions by calling source.toArrow directly (and as we discussed earlier in this review, that means we should now check for the presence of source.toArrow instead of source.toArrowBuffer).
const arrow = await loadArrow(); | |
const table = arrow.tableFromIPC(source.toArrowBuffer()); | |
return await insertArrowTable(database, name, table); | |
return await insertArrowTable(database, name, source.toArrow()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What version of Arquero are you using?
Our DuckDBClient depends on Apache Arrow version 9 (or maybe 8?) or later, which means you’ll need to use Arquero version 5 or later. The version of Arquero in the Observable Standard library is currently 4.8.8 which uses Apache Arrow ^3.0.0.
So, you’ll need to load a more recent version of Arquero for this to work, and more generally, we need to implement standard library versioning so that we can upgrade Arquero to a more recent version in new notebooks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using Arquero from our stdlib; my opinion is that this should work with the Arquero in our stdlib. I added a comment in the code about upgrading Arquero when we have standard library versioning, and I suggest we update the insertArqueroTable
function then. (We may also want the current implementation around for older notebooks.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. 👍
I hope we can get to stdlib versioning soon!
Resolves #330
toArrow
function.tableFromIPC()
with Arquero's.toArrowBuffer()
as source for a new DuckDBClient