Skip to content

CLN/BUG: Better validation of levels/labels/names #5209

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 14, 2013

Conversation

jtratner
Copy link
Contributor

Improved validation of levels, labels and names (plus fleshed out test
cases). Fixes #4794.

Changed an assert_almost_equal to assert_series_equal to better reflect
actual return value.

Also makes set_levels, set_labels and set_names match behavior of rename to not return self if inplace.

@jreback
Copy link
Contributor

jreback commented Oct 13, 2013

look ok to me

@jtratner
Copy link
Contributor Author

took a bit of wrangling to test correctly given the slightly weird internal representation of levels and labels. Do you think this needs to test that levels are all integers? I feel like most of the time people should not be calling MI directly.

@jtratner
Copy link
Contributor Author

note that 0.12 and earlier wasn't doing that either.

@jreback
Copy link
Contributor

jreback commented Oct 14, 2013

u could prob validate that it's int64 dtype
I agree users shouldn't really touch mi internals

@jtratner
Copy link
Contributor Author

I wonder if I can just call Int64Index and that'll be good enough? I'll try

@jreback
Copy link
Contributor

jreback commented Oct 14, 2013

I guess there are internal pandas methods that may create labels/levels (that re not in index) - eg reshape/ concat

@jtratner
Copy link
Contributor Author

huh? but I think the internal representation depends on levels being integers no? It's really simple actually, just change _ensure_index to _ensure_int64index. Going to see what happens.

@jtratner
Copy link
Contributor Author

of course it's labels that's actually supposed to be integer :P.

@jtratner
Copy link
Contributor Author

I'm going to leave it - there's a lot of complexity with checking that it's not a float and such, given that it's internal-ish (and we don't really have a good fastpath yet), probably better to just not figure out something complicated to check. Unless you feel strongly, I'll leave it for 0.14 and the index refactor (where it should be easier to reason about and check). Internal calls all use _ensure_int64 anyways.

@@ -385,9 +389,6 @@ def assert_almost_equal(a, b, check_less_precise=False):
np.testing.assert_(isiterable(b))
na, nb = len(a), len(b)
assert na == nb, "%s != %s" % (na, nb)
# TODO: Figure out why I thought this needed instance cheacks...
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for posterity - I think this was because sometimes assert_almost_equal was being passed a Series. but no part of the library deals with that anymore anyways.

@jtratner
Copy link
Contributor Author

@jreback okay?

@jreback
Copy link
Contributor

jreback commented Oct 14, 2013

yep

jtratner added a commit that referenced this pull request Oct 14, 2013
CLN/BUG: Better validation of levels/labels/names
@jtratner jtratner merged commit e686c5f into pandas-dev:master Oct 14, 2013
@jtratner jtratner deleted the validate-everything-on-mi branch October 14, 2013 01:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: MultiIndex needs to validate length of levels on setting.
2 participants