Skip to content

Add xarray-specific encoding convention for pd.IntervalArray #10483

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dcherian
Copy link
Contributor

@dcherian dcherian commented Jul 1, 2025

Closes #2847

Following the proposal in #8005 (comment), this PR adds encoding/decoding machinery for pd.IntervalArray objects. I use an ad-hoc convention:

  1. The data is stacked to a 2D array with the first dimension named __xarray_bounds__. This is not configurable at the moment.
  2. we record encoding attributes "closed", and "dtype" (this is always "pandas_interval").

It is possible to create an IntervalArray with Datetime and Timedelta objects so I've stuck the IntervalCoder first in the encoding pipeline, and last in the decoding pipeline. That way it stays independent.

TODO:

  • Add whats-new
  • Add decode_intervals kwarg?
  • Add docs to the "Internals" section.

coder = variables.IntervalCoder()
encoded = coder.encode(v)
expected = xr.Variable(
dims=("__xarray_bounds__", "time"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could make this trailing dimension

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot store data after groupby_bins
1 participant