Skip to content

Support Arquero tables #332

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Dec 2, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions src/duckdb.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ import {getArrowTableSchema, isArrowTable, loadArrow} from "./arrow.js";
import {duckdb} from "./dependencies.js";
import {FileAttachment} from "./fileAttachment.js";
import {cdn} from "./require.js";
import {isArqueroTable} from "./table.js";

// Adapted from https://observablehq.com/@cmudig/duckdb-client
// Copyright 2021 CMU Data Interaction Group
Expand Down Expand Up @@ -134,6 +135,8 @@ export class DuckDBClient {
await insertArrowTable(db, name, source);
} else if (Array.isArray(source)) { // bare array of objects
await insertArray(db, name, source);
} else if (isArqueroTable(source)) {
await insertArqueroTable(db, name, source);
} else if ("data" in source) { // data + options
const {data, ...options} = source;
if (isArrowTable(data)) {
Expand Down Expand Up @@ -215,6 +218,14 @@ async function insertArrowTable(database, name, table, options) {
}
}

async function insertArqueroTable(database, name, source) {
// TODO When we have stdlib versioning and can upgrade Arquero to version 5,
// we can then call source.toArrow() directly, with insertArrowTable()
const arrow = await loadArrow();
const table = arrow.tableFromIPC(source.toArrowBuffer());
return await insertArrowTable(database, name, table);
Comment on lines +224 to +226
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked how Arquero is implemented, and source.toArrowBuffer calls source.toArrow internally.

https://github.com/uwdata/arquero/blob/1c8f438df033026535e1c4f67b4f891fa6f8f29e/src/table/column-table.js#L306-L308

https://github.com/uwdata/arquero/blob/1c8f438df033026535e1c4f67b4f891fa6f8f29e/src/format/to-arrow.js#L6-L8

So, this code is currently converting an Arquero Table to an Arrow Table to an Array Buffer then back again to an Arrow Table. You can skip two conversions by calling source.toArrow directly (and as we discussed earlier in this review, that means we should now check for the presence of source.toArrow instead of source.toArrowBuffer).

Suggested change
const arrow = await loadArrow();
const table = arrow.tableFromIPC(source.toArrowBuffer());
return await insertArrowTable(database, name, table);
return await insertArrowTable(database, name, source.toArrow());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look, Mike! I started out trying to use source.toArrow rather than source.toArrowBuffer, but ran into errors that way (see screenshot), and the result of requestDatabaseTables for the cell when constructed that way is an empty array.
Screen Shot 2022-12-01 at 7 20 13 PM

Copy link
Member

@mbostock mbostock Dec 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What version of Arquero are you using?

Our DuckDBClient depends on Apache Arrow version 9 (or maybe 8?) or later, which means you’ll need to use Arquero version 5 or later. The version of Arquero in the Observable Standard library is currently 4.8.8 which uses Apache Arrow ^3.0.0.

So, you’ll need to load a more recent version of Arquero for this to work, and more generally, we need to implement standard library versioning so that we can upgrade Arquero to a more recent version in new notebooks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using Arquero from our stdlib; my opinion is that this should work with the Arquero in our stdlib. I added a comment in the code about upgrading Arquero when we have standard library versioning, and I suggest we update the insertArqueroTable function then. (We may also want the current implementation around for older notebooks.)

}

async function insertArray(database, name, array, options) {
const arrow = await loadArrow();
const table = arrow.tableFromJSON(array);
Expand Down
1 change: 1 addition & 0 deletions src/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,6 @@ export {
arrayIsPrimitive,
isDataArray,
isDatabaseClient,
isArqueroTable,
__table as applyDataTableOperations
} from "./table.js";
9 changes: 7 additions & 2 deletions src/table.js
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,11 @@ function isTypedArray(value) {
);
}

export function isArqueroTable(value) {
// Arquero tables have a `toArrowBuffer` function
return value && typeof value.toArrowBuffer === "function";
}

// __query is used by table cells; __query.sql is used by SQL cells.
export const __query = Object.assign(
async (source, operations, invalidation, name) => {
Expand Down Expand Up @@ -198,7 +203,7 @@ const loadTableDataSource = sourceCache(async (source, name) => {
if (/\.(arrow|parquet)$/i.test(source.name)) return loadDuckDBClient(source, name);
throw new Error(`unsupported file type: ${source.mimeType}`);
}
if (isArrowTable(source)) return loadDuckDBClient(source, name);
if (isArrowTable(source) || isArqueroTable(source)) return loadDuckDBClient(source, name);
return source;
});

Expand All @@ -214,7 +219,7 @@ const loadSqlDataSource = sourceCache(async (source, name) => {
throw new Error(`unsupported file type: ${source.mimeType}`);
}
if (isDataArray(source)) return loadDuckDBClient(await asArrowTable(source, name), name);
if (isArrowTable(source)) return loadDuckDBClient(source, name);
if (isArrowTable(source) || isArqueroTable(source)) return loadDuckDBClient(source, name);
return source;
});

Expand Down