Skip to content

XLSX support with ExcelJS #248

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 35 commits into from
Sep 15, 2021
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
a8f7998
XLSX support with ExcelJS
visnup Sep 3, 2021
38fceab
Prettier
visnup Sep 3, 2021
9226446
Change range option to nested arrays
visnup Sep 4, 2021
77159f0
Tests and bug fixes
visnup Sep 5, 2021
d8904d0
Respect header row order when resolving conflicts
visnup Sep 5, 2021
ee0dfbf
Fil/xlsx (#249)
Fil Sep 7, 2021
fd177b0
Column only range test case
visnup Sep 5, 2021
f6ddcff
sheetNames is enumerable
visnup Sep 7, 2021
9b9eab6
One more test to check for empty columns
visnup Sep 8, 2021
a845086
Add Node 16 to the test matrix
visnup Sep 9, 2021
f30b626
Revert reporter to classic for Node 16
visnup Sep 9, 2021
e421983
Don't fail matrix quickly in actions
visnup Sep 9, 2021
e8b0153
More coverage.
visnup Sep 9, 2021
e7c82d4
Example of .xlsx in README
visnup Sep 9, 2021
1440400
Remove Excel from Workbook naming
visnup Sep 9, 2021
d444ebe
Fix dates
visnup Sep 9, 2021
410f4c9
Fix for sharedFormula
visnup Sep 9, 2021
57cb0e0
Coerce errors to NaN
visnup Sep 10, 2021
e2976b1
Properly escape html
visnup Sep 10, 2021
e618095
Merge branch 'main' into visnup/xlsx
visnup Sep 10, 2021
b97b9f6
Make sheetNames read-only
visnup Sep 10, 2021
e5eb8d6
Require colons in range specifiers
visnup Sep 10, 2021
81433c6
Include row numbers
visnup Sep 10, 2021
2f26284
Use only string form ranges
visnup Sep 12, 2021
66a539c
Coerce range specifiers to strings
visnup Sep 12, 2021
fcb6eb7
Update README.md
visnup Sep 13, 2021
162d55e
Apply suggestions from code review
visnup Sep 13, 2021
9c9e91b
Simplify hyperlinks
visnup Sep 14, 2021
1a0345e
Prettier
visnup Sep 14, 2021
5daef26
Pass options through
visnup Sep 14, 2021
92c4af1
Rename helper functions for clarity, range tests
visnup Sep 14, 2021
6f13d59
Simpler
visnup Sep 14, 2021
c52b73f
Consistent comment format
visnup Sep 14, 2021
5b21a79
Consistent regexes
visnup Sep 14, 2021
0a59d0c
Fix hyperlinks for certain cases
visnup Sep 14, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/dependencies.js
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,4 @@ export const vegaliteApi = dependency("vega-lite-api", "5.0.0", "build/vega-lite
export const arrow = dependency("apache-arrow", "4.0.1", "Arrow.es2015.min.js");
export const arquero = dependency("arquero", "4.8.4", "dist/arquero.min.js");
export const topojson = dependency("topojson-client", "3.1.0", "dist/topojson-client.min.js");
export const exceljs = dependency("exceljs", "4.3.0", "dist/exceljs.min.js");
7 changes: 6 additions & 1 deletion src/fileAttachment.js
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
import {autoType, csvParse, csvParseRows, tsvParse, tsvParseRows} from "d3-dsv";
import {require as requireDefault} from "d3-require";
import {arrow, jszip} from "./dependencies.js";
import {arrow, jszip, exceljs} from "./dependencies.js";
import {SQLiteDatabaseClient} from "./sqlite.js";
import {ExcelWorkbook} from "./xlsx.js";

async function remote_fetch(file) {
const response = await fetch(await file.url());
Expand Down Expand Up @@ -70,6 +71,10 @@ class AbstractFile {
async html() {
return this.xml("text/html");
}
async xlsx() {
const [ExcelJS, buffer] = await Promise.all([requireDefault(exceljs.resolve()), this.arrayBuffer()]);
return new ExcelWorkbook(await new ExcelJS.Workbook().xlsx.load(buffer));
}
}

class FileAttachment extends AbstractFile {
Expand Down
99 changes: 99 additions & 0 deletions src/xlsx.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
export class ExcelWorkbook {
constructor(workbook) {
Object.defineProperty(this, "_", {value: workbook});
}
sheetNames() {
return this._.worksheets.map((sheet) => sheet.name);
}
sheet(name, {range, headers = false} = {}) {
const sheet = this._.getWorksheet(
typeof name === "number" ? this.sheetNames()[name] : name + ""
);
if (!sheet) throw new Error(`Sheet not found: ${name}`);
return extract(sheet, {range, headers});
}
}

function extract(sheet, {range, headers}) {
let [[c0, r0], [c1, r1]] = parseRange(range, sheet);
const seen = new Set();
const names = [];
const headerRow = headers && sheet._rows[r0++];
function name(n) {
if (!names[n]) {
let name = (headerRow ? valueOf(headerRow._cells[n]) : AA(n)) || AA(n);
while (seen.has(name)) name += "_";
seen.add((names[n] = name));
}
return names[n];
}
if (headerRow) for (let c = c0; c <= c1; c++) name(c);

const output = new Array(r1 - r0 + 1).fill({});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is filling the output with a shared empty object for all rows, whereas the rows with values are reassigned below to new objects. Do we want to use an empty object to represent rows without values, rather than undefined? If we do want to use an empty object, I think we’ll still want a distinct object for each row, rather than sharing the object across rows. That could be done by moving the output[r - r0] = {} below before the continue rather that using array.fill here.

Copy link
Member Author

@visnup visnup Sep 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to use sparseness/undefined initially but Inputs.table had trouble with it, throwing an error trying to get a field from it. re-using the same object was a pre-optimization from a memory usage standpoint. I also toyed with using a Symbol("empty") to make it even more explicit.

I slightly want to auto-filter these rows out of the return value since I feel like in usage that would be one of the first things I'd always end up writing anyway in the notebook, but that seemed like possibly surprising behavior at the same time?

for (let r = r0; r <= r1; r++) {
const _row = sheet._rows[r];
if (!_row || !_row.hasValues) continue;
const row = (output[r - r0] = {});
for (let c = c0; c <= c1; c++) {
const value = valueOf(_row._cells[c]);
if (value !== null && value !== undefined) row[name(c)] = value;
}
}

output.columns = names.filter(() => true);
return output;
}

function valueOf(cell) {
if (!cell) return;
const {value} = cell;
if (value && typeof value === "object") {
if (value.formula) return value.result;
if (value.richText) return value.richText.map((d) => d.text).join("");
if (value.text && value.hyperlink)
return `<a href="${value.hyperlink}">${value.text}</a>`;
}
return value;
}

function parseRange(specifier = [], {columnCount, rowCount}) {
if (typeof specifier === "string") {
const [
[c0 = 0, r0 = 0] = [],
[c1 = columnCount - 1, r1 = rowCount - 1] = [],
] = specifier.split(":").map(NN);
return [
[c0, r0],
[c1, r1],
];
} else if (typeof specifier === "object") {
const [
[c0 = 0, r0 = 0] = [],
[c1 = columnCount - 1, r1 = rowCount - 1] = [],
] = specifier;
return [
[c0, r0],
[c1, r1],
];
}
}

function AA(c) {
let sc = "";
c++;
do {
sc = String.fromCharCode(64 + (c % 26 || 26)) + sc;
} while ((c = Math.floor((c - 1) / 26)));
return sc;
}

function NN(s = "") {
const [, sc, sr] = s.match(/^([a-zA-Z]+)?(\d+)?$/);
let c = undefined;
if (sc) {
c = 0;
for (let i = 0; i < sc.length; i++)
c += Math.pow(26, sc.length - i - 1) * (sc.charCodeAt(i) - 64);
}
return [c && c - 1, sr && +sr - 1];
}
156 changes: 156 additions & 0 deletions test/xlsx-test.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
import {test} from "tap";
import {ExcelWorkbook} from "../src/xlsx.js";

function mockWorkbook(contents) {
return {
worksheets: Object.keys(contents).map((name) => ({name})),
getWorksheet(name) {
const _rows = contents[name];
return {
_rows: _rows.map((row) => ({
_cells: row.map((cell) => ({value: cell})),
hasValues: !!row.length,
})),
rowCount: _rows.length,
columnCount: Math.max(..._rows.map((r) => r.length)),
};
},
};
}

test("FileAttachment.xlsx reads sheet names", (t) => {
const workbook = new ExcelWorkbook(mockWorkbook({Sheet1: []}));
t.same(workbook.sheetNames(), ["Sheet1"]);
t.end();
});

test("FileAttachment.xlsx reads sheets", (t) => {
const workbook = new ExcelWorkbook(
mockWorkbook({
Sheet1: [
["one", "two", "three"],
[1, 2, 3],
],
})
);
t.same(workbook.sheet(0), [
{A: "one", B: "two", C: "three"},
{A: 1, B: 2, C: 3},
]);
t.end();
});

test("FileAttachment.xlsx reads sheets with different types", (t) => {
const workbook = new ExcelWorkbook(
mockWorkbook({
Sheet1: [
["one", {richText: [{text: "two"}, {text: "three"}]}],
[
{text: "link", hyperlink: "https://example.com"},
2,
{formula: "=B2*5", result: 10},
],
],
})
);
t.same(workbook.sheet(0), [
{A: "one", B: "twothree"},
{A: `<a href="https://example.com">link</a>`, B: 2, C: 10},
]);
t.end();
});

test("FileAttachment.xlsx reads sheets with headers", (t) => {
const workbook = new ExcelWorkbook(
mockWorkbook({
Sheet1: [
[null, "one", "one", "two", "A"],
[ 1, null, 3, 4, 5],
[ 6, 7, 8, 9, 10],
],
})
);
t.same(workbook.sheet(0, {headers: true}), [
{A: 1, one_: 3, two: 4, A_: 5},
{A: 6, one: 7, one_: 8, two: 9, A_: 10},
]);
t.end();
});

test("FileAttachment.xlsx reads sheet ranges", (t) => {
const workbook = new ExcelWorkbook(
mockWorkbook({
Sheet1: [
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
],
})
);

// undefined
// ""
// []
const entireSheet = [
{A: 0, B: 1, C: 2, D: 3, E: 4, F: 5, G: 6, H: 7, I: 8, J: 9},
{A: 10, B: 11, C: 12, D: 13, E: 14, F: 15, G: 16, H: 17, I: 18, J: 19},
{A: 20, B: 21, C: 22, D: 23, E: 24, F: 25, G: 26, H: 27, I: 28, J: 29},
{A: 30, B: 31, C: 32, D: 33, E: 34, F: 35, G: 36, H: 37, I: 38, J: 39},
];
t.same(workbook.sheet(0), entireSheet);
t.same(workbook.sheet(0, {range: ""}), entireSheet);
t.same(workbook.sheet(0, {range: []}), entireSheet);

// "B2:C3"
// [[1,1],[2,2]]
t.same(workbook.sheet(0, {range: "B2:C3"}), [
{B: 11, C: 12},
{B: 21, C: 22},
]);
t.same(
workbook.sheet(0, {
range: [
[1, 1],
[2, 2],
],
}),
[
{B: 11, C: 12},
{B: 21, C: 22},
]
);

// ":C3"
// [,[2,2]]
t.same(workbook.sheet(0, {range: ":C3"}), [
{A: 0, B: 1, C: 2},
{A: 10, B: 11, C: 12},
{A: 20, B: 21, C: 22},
]);
t.same(workbook.sheet(0, {range: [undefined, [2, 2]]}), [
{A: 0, B: 1, C: 2},
{A: 10, B: 11, C: 12},
{A: 20, B: 21, C: 22},
]);

// "B2"
// [[1,1]]
t.same(workbook.sheet(0, {range: "B2"}), [
{B: 11, C: 12, D: 13, E: 14, F: 15, G: 16, H: 17, I: 18, J: 19},
{B: 21, C: 22, D: 23, E: 24, F: 25, G: 26, H: 27, I: 28, J: 29},
{B: 31, C: 32, D: 33, E: 34, F: 35, G: 36, H: 37, I: 38, J: 39},
]);
t.same(workbook.sheet(0, {range: [[1, 1]]}), [
{B: 11, C: 12, D: 13, E: 14, F: 15, G: 16, H: 17, I: 18, J: 19},
{B: 21, C: 22, D: 23, E: 24, F: 25, G: 26, H: 27, I: 28, J: 29},
{B: 31, C: 32, D: 33, E: 34, F: 35, G: 36, H: 37, I: 38, J: 39},
]);

// "2"
// [[,1]]
t.same(workbook.sheet(0, {range: "2"}), entireSheet.slice(1));
t.same(workbook.sheet(0, {range: [[undefined, 1]]}), entireSheet.slice(1));

t.end();
});