Skip to content

Add as_bytearray=False flag to decompress. #107

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 18, 2016

Conversation

llllllllll
Copy link
Contributor

To make it easier to decompress data into a mutable array without
copying there is a new flag 'as_bytearray' to the decompress
function. This will make decompress return a mutable array of bytes
instead of an immutable collection.

See pandas-dev/pandas#12359 for context on the use case.
About the implementation: I tried having a branch that set two function pointers, one for from_string_and_size and the other for as_string but that showed worse performance than just using CPP to specialize the two branches. If the you think that the slight performance increase is not worth the code complexity I can repush a change without using CPP for the two branches.

@llllllllll
Copy link
Contributor Author

I can reproduce this sporadic failure when I run many times locally, unfortunatly, the stack trace I am getting out of the segfault is:

#0  0x0000000000000000 in ?? ()
#1  0x00007fe53c854290 in blosc_c (context=0x561ee626f0c0, blocksize=blocksize@entry=262144, leftoverblock=leftoverblock@entry=0, 
    ntbytes=ntbytes@entry=0, maxbytes=maxbytes@entry=262160, src=src@entry=0x7fe538abc030 "", dest=0x561ee66a03a0 "", tmp=0x561ee6660360 "")
    at c-blosc/blosc/blosc.c:533
#2  0x00007fe53c85491e in t_blosc (ctxt=0x561ee626c080) at c-blosc/blosc/blosc.c:1483
#3  0x00007fe5416054a4 in start_thread () from /usr/lib/libpthread.so.0
#4  0x00007fe54134313d in clone () from /usr/lib/libc.so.6

I am running on arch gnu+linux x64 compiling with gcc 5.3.0

@@ -87,19 +87,29 @@ def test_decompress_input_types(self):
import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to add a new test for bytearray and keep the existing one for bytes.

@FrancescAlted
Copy link
Member

Bar the comment about adding a new test, this looks good to me. The crash that you are getting is before or after this PR? Can you attach a minimal script that produces the crash?

To make it easier to decompress data into a mutable array without
copying there is a new flag 'as_bytearray' to the decompress
function. This will make decompress return a mutable array of bytes
instead of an immutable collection.
@llllllllll
Copy link
Contributor Author

I split the tests.

I just pulled head of master and can reproduce the the segfault by running the tests a few times. Based on the core dump it looks like it in shuffle after starting a new thread. If there is anything I can do to help debug this issue please let me know.

@FrancescAlted
Copy link
Member

Hmm, could you please update to latest c-blosc and see if the issue goes
away?
El dia 17/02/2016 20:20, "Joe Jevnik" [email protected] va
escriure:

I split the tests.

I just pulled head of master and can reproduce the the segfault by running
the tests a few times. Based on the core dump it looks like it in shuffle
after starting a new thread. If there is anything I can do to help debug
this issue please let me know.


Reply to this email directly or view it on GitHub
#107 (comment).

@llllllllll
Copy link
Contributor Author

Is this different from the c-blosc that is vendored in this repo?

@FrancescAlted
Copy link
Member

Yes.
El dia 17/02/2016 21:02, "Joe Jevnik" [email protected] va
escriure:

Is this different from the c-blosc that is vendored in this repo?


Reply to this email directly or view it on GitHub
#107 (comment).

@llllllllll
Copy link
Contributor Author

I will try that, thanks!

@llllllllll
Copy link
Contributor Author

Windows error seems unrelated:
c-blosc/blosc\bitshuffle-sse2.c(20) : fatal error C1189: #error : SSE2 is not supported by the target

@llllllllll
Copy link
Contributor Author

ping, any idea what we should do about windows?

@FrancescAlted
Copy link
Member

Hi. Sorry about this, but is difficult for me to maintain old version of MSVC 2008 in Windows for Blosc (see this for the reasons). Some love from the community in this regard would be very welcome indeed.

At any rate, from your report, I think you was using Unix in your setup. So going back to the initial question, do 1.7.1 fix your crash?

@llllllllll
Copy link
Contributor Author

I am not getting crashes after rebasing, not sure what to do about windows though. Also I am running arch gnu+linux with gcc 5.3

@FrancescAlted
Copy link
Member

Great. So the quick fix for Python 2 on Windows (and its dependency on MSVC 2008) would be to fall back to c-blosc 1.4.x branch which is known to run well there.

So, I will merge this and then will try to go back to c-blosc 1.4.x.

@FrancescAlted
Copy link
Member

Hmm, before merging, could you please update the RELEASE_NOTES.rst accordingly? Thanks!

FrancescAlted added a commit that referenced this pull request Mar 18, 2016
Add as_bytearray=False flag to decompress.
@FrancescAlted FrancescAlted merged commit b1ba8f7 into Blosc:master Mar 18, 2016
@llllllllll
Copy link
Contributor Author

Sorry, just got in, do you still want me to update the release notes?

@FrancescAlted
Copy link
Member

No need. Just updated them in commit 4933f1e. Hope this is fine with you.

@llllllllll
Copy link
Contributor Author

That looks great, thank you very much!

@FrancescAlted
Copy link
Member

And C-Blosc downgraded to 1.4.5 in 015409e. Hope this will fix the segfaults and revert VS2008 compatibility on Win.

@FrancescAlted
Copy link
Member

Well, due to an oversight, the bump to C-Blosc 1.4.5 was actually completed here: de556a0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants