-
Notifications
You must be signed in to change notification settings - Fork 76
Add as_bytearray=False flag to decompress. #107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ea6e1ed
to
1933889
Compare
I can reproduce this sporadic failure when I run many times locally, unfortunatly, the stack trace I am getting out of the segfault is:
I am running on arch gnu+linux x64 compiling with gcc 5.3.0 |
@@ -87,19 +87,29 @@ def test_decompress_input_types(self): | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to add a new test for bytearray and keep the existing one for bytes.
Bar the comment about adding a new test, this looks good to me. The crash that you are getting is before or after this PR? Can you attach a minimal script that produces the crash? |
1933889
to
1852462
Compare
To make it easier to decompress data into a mutable array without copying there is a new flag 'as_bytearray' to the decompress function. This will make decompress return a mutable array of bytes instead of an immutable collection.
I split the tests. I just pulled head of master and can reproduce the the segfault by running the tests a few times. Based on the core dump it looks like it in |
Hmm, could you please update to latest c-blosc and see if the issue goes
|
Is this different from the |
Yes.
|
I will try that, thanks! |
Windows error seems unrelated: |
ping, any idea what we should do about windows? |
Hi. Sorry about this, but is difficult for me to maintain old version of MSVC 2008 in Windows for Blosc (see this for the reasons). Some love from the community in this regard would be very welcome indeed. At any rate, from your report, I think you was using Unix in your setup. So going back to the initial question, do 1.7.1 fix your crash? |
I am not getting crashes after rebasing, not sure what to do about windows though. Also I am running arch gnu+linux with gcc 5.3 |
Great. So the quick fix for Python 2 on Windows (and its dependency on MSVC 2008) would be to fall back to c-blosc 1.4.x branch which is known to run well there. So, I will merge this and then will try to go back to c-blosc 1.4.x. |
Hmm, before merging, could you please update the RELEASE_NOTES.rst accordingly? Thanks! |
Add as_bytearray=False flag to decompress.
Sorry, just got in, do you still want me to update the release notes? |
No need. Just updated them in commit 4933f1e. Hope this is fine with you. |
That looks great, thank you very much! |
And C-Blosc downgraded to 1.4.5 in 015409e. Hope this will fix the segfaults and revert VS2008 compatibility on Win. |
Well, due to an oversight, the bump to C-Blosc 1.4.5 was actually completed here: de556a0 |
To make it easier to decompress data into a mutable array without
copying there is a new flag 'as_bytearray' to the decompress
function. This will make decompress return a mutable array of bytes
instead of an immutable collection.
See pandas-dev/pandas#12359 for context on the use case.
About the implementation: I tried having a branch that set two function pointers, one for
from_string_and_size
and the other foras_string
but that showed worse performance than just using CPP to specialize the two branches. If the you think that the slight performance increase is not worth the code complexity I can repush a change without using CPP for the two branches.