Skip to content

Performance regression in V3 #2710

Open
@y4n9squared

Description

@y4n9squared

Zarr version

3.0.0

Numcodecs version

0.14.1

Python Version

3.13

Operating System

Linux

Installation

Using uv

Description

This simple workload, which writes out the numbers 1 through 1e9 in 64 separate chunks,

# Using zarr==2.18.2
import numpy as np
import zarr
from zarr._storage.v3 import DirectoryStoreV3

store = DirectoryStoreV3("/tmp/foo.zarr")
arr = zarr.array(np.arange(1024 * 1024 * 1024, dtype=np.float64), chunks=(1024 * 1024 * 16,))
zarr.save_array(store, arr, zarr_version=3, path="/")

run in about 5s on my machine on version 2.18.

The equivalent workload on version 3 takes over a minute:

# Using zarr==3.0
import numpy as np
import zarr
import zarr.codecs
from zarr.storage import LocalStore

store = LocalStore("/tmp/bar.zarr")

compressors = zarr.codecs.BloscCodec(cname='lz4', shuffle=zarr.codecs.BloscShuffle.bitshuffle)

za = zarr.create_array(
    store,
    shape=(1024 * 1024 * 1024,),
    chunks=(1024 * 1024 * 16,),
    dtype=np.float64,
    compressors=compressors,
)

arr = np.arange(1024 * 1024 * 1024, dtype=np.float64)
za[:] = arr

Steps to reproduce

See above

Additional output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugPotential issues with the zarr-python libraryperformancePotential issues with Zarr performance (I/O, memory, etc.)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions