-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Recreate @gajomi's #2070 to keep attrs when calling astype() #4314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! One comment and then good-to-go on the code
xarray/tests/test_dataarray.py
Outdated
def test_astype_attrs(self): | ||
mda1 = self.mda.copy() | ||
mda1.attrs["foo"] = "bar" | ||
mda2 = mda1.astype(bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we already have tests for .astype
working correctly? If not, could we add a couple of simple cases (no need to do too much tbc)
Thanks @max-sixty. I pushed a second commit, including adding a What do you make of the failing tests in |
I'm not sure re the test at first glance, I don't see how those tests are related, but master seems to pass. |
Looks like |
I've opened an issue at sparse. Let's see what the outcome is. |
well that was fixed quickly! Now we need to think of backcompat workarounds. |
What do you think about reraising the error with a message about upgrading? Is this a narrow enough use case that we can avoid backward-compat fix in xarray? |
Nice! Thanks a million @dcherian for raising that issue over at sparse. @max-sixty regarding the minimum version, it seems like sparse has a relatively infrequent release cycle (many months) so it could be a while before we have a minimum version to require. Could we check whether we are dealing with a sparse array and call the numpy method instead for that use case, until sparse supports it? |
For sure — good idea to defer to numpy rather than raise — should we do that only on existing versions though? Otherwise we won't roll over to the new version when they do release their next version? |
I wonder if we should just let it raise (but xfail our test on sparse < 0.10.0). This is an "imperfect NEP-18 implementation" issue and it isn't feasible to catch all such errors. |
My solution of try/except and popping casting from kwargs felt a bit kludgey, so I'm definitely open to other ideas. Letting it raise and xfailing the test would mean that we can no longer perform |
I'm fine with reraising — though I think we should reraise, otherwise isn't it a confusing message? The try-expect is probably fine given the narrowness (very open to holding too low a standard tho); one step up would be checking it is a spare array. |
Maybe I am overcomplicating things but I came up with this solution: Catch the error. If it is the unallowed casting kwarg error, and we are dealing with a sparse array, pop the casting kwarg and issue a RuntimeWarning that we are doing so. Otherwise, raise the original error since it might be something else non-sparse related. Does that seem reasonable? |
Hello @dnowacki-usgs! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-08-19 19:20:14 UTC |
I think(?) the pep warnings are from the commits @max-sixty pushed several days ago during testing |
I think that's a false positive, our |
Great — I think this looks good — any final thoughts? |
Somehow the contents of #4248 got in here. Can you try to merge master and see if they disappear? |
Did a fetch and rebase, not sure why commits from other PRs added to master in the meantime show up in this one 🤷 |
Weird - it got even worse. I can try tomorrow morning (European time) unless @max-sixty wants to have a go. |
The weirdest part is that the local diff is fine; here's the output of Sorry if I contributed to this. One override is to apply this patch on master and force push this branch. I'm happy to do it if @dnowacki-usgs agrees. diff --git a/doc/whats-new.rst b/doc/whats-new.rst
index 793f72fb..4b7b117b 100644
--- a/doc/whats-new.rst
+++ b/doc/whats-new.rst
@@ -21,7 +21,9 @@ v0.16.1 (unreleased)
Breaking changes
~~~~~~~~~~~~~~~~
-
+- :py:meth:`DataArray.astype` and :py:meth:`Dataset.astype` now preserve attributes. Keep the
+ old behavior by passing `keep_attrs=False` (:issue:`2049`, :pull:`4314`).
+ By `Dan Nowacki <https://github.com/dnowacki-usgs>`_ and `Gabriel Joel Mitchell <https://github.com/gajomi>`_.
New Features
~~~~~~~~~~~~
diff --git a/xarray/core/common.py b/xarray/core/common.py
index bc5035b6..7b6fc13c 100644
--- a/xarray/core/common.py
+++ b/xarray/core/common.py
@@ -1305,6 +1305,48 @@ class DataWithCoords(SupportsArithmetic, AttrAccessMixin):
dask="allowed",
)
+ def astype(self, dtype, casting="unsafe", copy=True, keep_attrs=True):
+ """
+ Copy of the xarray object, with data cast to a specified type.
+ Leaves coordinate dtype unchanged.
+
+ Parameters
+ ----------
+ dtype : str or dtype
+ Typecode or data-type to which the array is cast.
+ casting : {'no', 'equiv', 'safe', 'same_kind', 'unsafe'}, optional
+ Controls what kind of data casting may occur. Defaults to 'unsafe'
+ for backwards compatibility.
+
+ * 'no' means the data types should not be cast at all.
+ * 'equiv' means only byte-order changes are allowed.
+ * 'safe' means only casts which can preserve values are allowed.
+ * 'same_kind' means only safe casts or casts within a kind,
+ like float64 to float32, are allowed.
+ * 'unsafe' means any data conversions may be done.
+ copy : bool, optional
+ By default, astype always returns a newly allocated array. If this
+ is set to False and the `dtype` requirement is satisfied, the input
+ array is returned instead of a copy.
+ keep_attrs : bool, optional
+ By default, astype keeps attributes. Set to False to remove
+ attributes in the returned object.
+
+ See also
+ --------
+ np.ndarray.astype
+ dask.array.Array.astype
+ """
+ from .computation import apply_ufunc
+
+ return apply_ufunc(
+ duck_array_ops.astype,
+ self,
+ kwargs=dict(dtype=dtype, casting=casting, copy=copy),
+ keep_attrs=keep_attrs,
+ dask="allowed",
+ )
+
def __enter__(self: T) -> T:
return self
diff --git a/xarray/core/duck_array_ops.py b/xarray/core/duck_array_ops.py
index 3d192882..7b28d396 100644
--- a/xarray/core/duck_array_ops.py
+++ b/xarray/core/duck_array_ops.py
@@ -149,6 +149,33 @@ masked_invalid = _dask_or_eager_func(
)
+def astype(data, **kwargs):
+ try:
+ return data.astype(**kwargs)
+ except TypeError as e:
+ # FIXME: This should no longer be necessary in future versions of sparse
+ # Current versions of sparse (v0.10.0) don't support the "casting" kwarg
+ # This was fixed by https://github.com/pydata/sparse/pull/392
+ try:
+ import sparse
+ except ImportError:
+ sparse = None
+ if (
+ "got an unexpected keyword argument 'casting'" in repr(e)
+ and sparse is not None
+ and isinstance(data, sparse._coo.core.COO)
+ ):
+ warnings.warn(
+ "The current version of sparse does not support the 'casting' argument. It will be ignored in the call to astype().",
+ RuntimeWarning,
+ stacklevel=4,
+ )
+ kwargs.pop("casting")
+ else:
+ raise e
+ return data.astype(**kwargs)
+
+
def asarray(data, xp=np):
return (
data
diff --git a/xarray/core/ops.py b/xarray/core/ops.py
index 36753179..dad37f3b 100644
--- a/xarray/core/ops.py
+++ b/xarray/core/ops.py
@@ -42,7 +42,7 @@ NUM_BINARY_OPS = [
NUMPY_SAME_METHODS = ["item", "searchsorted"]
# methods which don't modify the data shape, so the result should still be
# wrapped in an Variable/DataArray
-NUMPY_UNARY_METHODS = ["astype", "argsort", "clip", "conj", "conjugate"]
+NUMPY_UNARY_METHODS = ["argsort", "clip", "conj", "conjugate"]
PANDAS_UNARY_FUNCTIONS = ["isnull", "notnull"]
# methods which remove an axis
REDUCE_METHODS = ["all", "any"]
diff --git a/xarray/core/variable.py b/xarray/core/variable.py
index 1f86a403..6d53a4fe 100644
--- a/xarray/core/variable.py
+++ b/xarray/core/variable.py
@@ -360,6 +360,47 @@ class Variable(
)
self._data = data
+ def astype(self, dtype, casting="unsafe", copy=True, keep_attrs=True):
+ """
+ Copy of the Variable object, with data cast to a specified type.
+
+ Parameters
+ ----------
+ dtype : str or dtype
+ Typecode or data-type to which the array is cast.
+ casting : {'no', 'equiv', 'safe', 'same_kind', 'unsafe'}, optional
+ Controls what kind of data casting may occur. Defaults to 'unsafe'
+ for backwards compatibility.
+
+ * 'no' means the data types should not be cast at all.
+ * 'equiv' means only byte-order changes are allowed.
+ * 'safe' means only casts which can preserve values are allowed.
+ * 'same_kind' means only safe casts or casts within a kind,
+ like float64 to float32, are allowed.
+ * 'unsafe' means any data conversions may be done.
+ copy : bool, optional
+ By default, astype always returns a newly allocated array. If this
+ is set to False and the `dtype` requirement is satisfied, the input
+ array is returned instead of a copy.
+ keep_attrs : bool, optional
+ By default, astype keeps attributes. Set to False to remove
+ attributes in the returned object.
+
+ See also
+ --------
+ np.ndarray.astype
+ dask.array.Array.astype
+ """
+ from .computation import apply_ufunc
+
+ return apply_ufunc(
+ duck_array_ops.astype,
+ self,
+ kwargs=dict(dtype=dtype, casting=casting, copy=copy),
+ keep_attrs=keep_attrs,
+ dask="allowed",
+ )
+
def load(self, **kwargs):
"""Manually trigger loading of this variable's data from disk or a
remote source into memory and return this variable.
diff --git a/xarray/tests/test_dataarray.py b/xarray/tests/test_dataarray.py
index f10404d7..77969e5c 100644
--- a/xarray/tests/test_dataarray.py
+++ b/xarray/tests/test_dataarray.py
@@ -1874,6 +1874,19 @@ class TestDataArray:
bar = Variable(["x", "y"], np.zeros((10, 20)))
assert_equal(self.dv, np.maximum(self.dv, bar))
+ def test_astype_attrs(self):
+ for v in [self.va.copy(), self.mda.copy(), self.ds.copy()]:
+ v.attrs["foo"] = "bar"
+ assert list(v.attrs.items()) == list(v.astype(float).attrs.items())
+ assert [] == list(v.astype(float, keep_attrs=False).attrs.items())
+
+ def test_astype_dtype(self):
+ original = DataArray([-1, 1, 2, 3, 1000])
+ converted = original.astype(float)
+ assert_array_equal(original, converted)
+ assert np.issubdtype(original.dtype, np.integer)
+ assert np.issubdtype(converted.dtype, np.floating)
+
def test_is_null(self):
x = np.random.RandomState(42).randn(5, 6)
x[x < 0] = np.nan
diff --git a/xarray/tests/test_dataset.py b/xarray/tests/test_dataset.py
index da7621dc..9a95eb24 100644
--- a/xarray/tests/test_dataset.py
+++ b/xarray/tests/test_dataset.py
@@ -5607,6 +5607,17 @@ class TestDataset:
np.testing.assert_equal(padded["var1"].isel(dim2=[0, -1]).data, 42)
np.testing.assert_equal(padded["dim2"][[0, -1]].data, np.nan)
+ def test_astype_attrs(self):
+ data = create_test_data(seed=123)
+ data.attrs["foo"] = "bar"
+
+ assert list(data.attrs.items()) == list(data.astype(float).attrs.items())
+ assert list(data.var1.attrs.items()) == list(
+ data.astype(float).var1.attrs.items()
+ )
+ assert [] == list(data.astype(float, keep_attrs=False).attrs.items())
+ assert [] == list(data.astype(float, keep_attrs=False).var1.attrs.items())
+
# Py.test tests
|
There are some duplicated commits — 3x I'd suggest the squash and push -f...
|
e23529e
to
612acef
Compare
I think that was due to a rebase merge (?). I fixed the commit history using cherry-picking and force-pushed to your branch, but there's a backup of the old branch if something went wrong. @dnowacki-usgs, if you want to add new commits you'll first have to force-pull (or delete your local branch and check out the remote branch – just make sure that you don't lose any unpushed commits). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dnowacki-usgs This generally looks great. I've added some minor comments below.
Thanks for the review @dcherian. I modified the tests as suggested and added some docstring updates. Let me know what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dnowacki-usgs . This LGTM.
The backcompat code is a little not-nice but I can't think of a better solution. It would be nice to not break things overnight... so thanks!
xarray/core/duck_array_ops.py
Outdated
try: | ||
return data.astype(**kwargs) | ||
except TypeError as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be better to do something like
try:
import sparse
except ImportError:
sparse = None
if (
sparse is not None
and isinstance(data, sparse.C00)
and LooseVersion(sparse.__version__) < LooseVersion("0.10.0")
and "casting" in kwargs
):
# warn
kwargs.pop("casting")
return data.astype(**kwargs)
?
that way, we wouldn't call data.astype(**kwargs)
twice
I just saw I was up for reviewing this — sorry for missing that. Looks good. @keewis 's suggestion for the back-compat code looks like a nice refinement — it includes a version check which will ease disentangling it. Could we add that and then merge? |
Updated the try/except following @keewis's lead. Also sparse 0.11.0 has been released so I was able to check that the warning is correctly issued for 0.10.0, then I upgraded, and the warning is no longer issued (and sparse doesn't choke on the casting kwarg). |
could you merge in |
which command did you use to create the merge? I think that was some sort of rebase merge, which means that all merged commits are included in the commit history, and github doesn't seem to work well with this kind of merges. You also have duplicated commits in the history, which confuses me. Edit: it seems you somehow merged / rebased the branch onto itself. Just as reference, I usually use git checkout master
git pull
git checkout <feature-branch>
git merge master
git push for merging |
Ugh, sorry about that. I'm following the instructions in contributing to xarray. It's a fetch/rebase. Happy to do it another way if there is a better practice. Thanks. |
653676d
to
6d63b05
Compare
ah, now I understand where the duplication is coming from. I'm guessing you did that and then tried to push, but it complained about different commits, so you pulled and then pushed. I fixed it for you. I think we should update that guide, rebasing is usually not a good idea because then you'd have to force-push, too (and github doesn't handle force-pushing that well). |
I think it should be git checkout master
git pull
git checkout <feature-branch>
git merge master
git push or git fetch upstream
git merge upstream/master
git push instead |
Ah, I see, |
I've opened a PR to update the guide, hopefully that helps. |
Thanks @dnowacki-usgs |
This is a do-over of @gajomi's #2070 with minimal additional tests to preserve attrs on Datasets as well as DataArrays.
isort . && black . && mypy . && flake8
whats-new.rst