-
Notifications
You must be signed in to change notification settings - Fork 35
Closed
Labels
virtual references 👻Involves virtual kerchunk/virtualizarr chunk referencesInvolves virtual kerchunk/virtualizarr chunk references
Description
In order to create and use virtual datasets with python, users will want to use kerchunk
and virtualizarr
. These are just starting down the path to zarr 3 and icechunk compatability. This issue will be used to track progress and relevant PRs:
- Support writing to icechunk from Virtualizarr: Add Icechunk Support zarr-developers/VirtualiZarr#256 Writing virtual references into Icechunk from VirtualiZarr #103
- Support zarr 3 codecs in Virtualizarr: Fix v3 codec pipeline VirtualiZarr#4
- Zarr 3 support for kerchunk:
zarr-python
v3 compatibility fsspec/kerchunk#516 - Numcodecs zarr 3 wrapper: Add wrappers for zarr v3 zarr-developers/numcodecs#524 + Sync with zarr 3 beta zarr-developers/numcodecs#597
- Xarray zarr 3 support: Compatibility for zarr-python 3.x pydata/xarray#9552
All of this can be installed with pip
. However we need to install with three steps for now to avoid version conflicts:
pip install icechunk xarray VirtualiZarr kerchunk
This assumes also having fsspec
and s3fs
and h5
installed:
pip install fsspec s3fs h5py h5netcdf
With all of this installed, HDF5 virtual datasets currently work like this:
from datetime import datetime, timezone
import icechunk
import xarray as xr
import virtualizarr
url = 's3://met-office-atmospheric-model-data/global-deterministic-10km/20250204T0000Z/20250204T0000Z-PT0000H00M-pressure_at_mean_sea_level.nc'
so = dict(anon=True, default_fill_cache=False, default_cache_type="none")
# create virtualizarr dataset
vds = virtualizarr.open_virtual_dataset(url, reader_options={'storage_options': so}, indexes={})
# create an icechunk repo that can read virtual chunks from eu-west-region with anonymous access
storage = icechunk.local_filesystem_storage("./ukmet")
config = icechunk.RepositoryConfig.default()
config.set_virtual_chunk_container(icechunk.VirtualChunkContainer("s3", "s3://", icechunk.s3_store(region="eu-west-2")))
credentials = icechunk.containers_credentials(s3=icechunk.s3_credentials(anonymous=True))
repo = icechunk.Repository.create(storage, config, credentials)
# create a session, and write to a group inside it using virtualizarr
session = repo.writable_session("main")
vds.virtualize.to_icechunk(session.store, group="msl", last_updated_at=datetime.now(timezone.utc))
# commit to save progress
session.commit("Add msl pressure")
# open it back up
ds = xr.open_zarr(session.store, group="msl", zarr_format=3, consolidated=False, decode_times=False)
ds
# plot!
ds.air_pressure_at_sea_level.plot()
Updated 2/4/2025
TomNicholas, norlandrhagen, maxrjones and srstsavage
Metadata
Metadata
Assignees
Labels
virtual references 👻Involves virtual kerchunk/virtualizarr chunk referencesInvolves virtual kerchunk/virtualizarr chunk references