Skip to content

Use utf-8 encoding by default for all subprocess communication. #10558

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 25, 2020

Conversation

sbc100
Copy link
Collaborator

@sbc100 sbc100 commented Feb 24, 2020

This seems to mostly be a problem on windows. There is even a PEP
out to make this the default: https://www.python.org/dev/peps/pep-0597

Fixes: #10551

This seems to mostly be a problem on windows. There is even a PEP
out to make this the default: https://www.python.org/dev/peps/pep-0597

Fixes: #10551
@sbc100 sbc100 requested a review from kripken February 24, 2020 22:11
@sbc100 sbc100 merged commit 7b9b09a into master Feb 25, 2020
@sbc100 sbc100 deleted the utf8_by_default branch February 25, 2020 04:55
@juj
Copy link
Collaborator

juj commented Feb 25, 2020

This broke Closure compiler (testing on Windows):

E:\code\emsdk\emscripten\master>emcc tests\hello_world.c -O1 --closure 1 -s INCLUDE_FULL_LIBRARY=1 -s ERROR_ON_UNDEFINED_SYMBOLS=0
warning: undefined symbol: rand
warning: undefined symbol: memmove
warning: undefined symbol: realloc
warning: undefined symbol: htons
warning: undefined symbol: ntohs
warning: undefined symbol: emscripten_builtin_malloc
warning: undefined symbol: emscripten_builtin_free
warning: undefined symbol: emscripten_builtin_memalign
warning: undefined symbol: memalign
warning: undefined symbol: glOrtho
warning: undefined symbol: emscripten_GetAlcProcAddress
warning: undefined symbol: emscripten_GetAlProcAddress
warning: undefined symbol: emscripten_GetProcAddress
warning: undefined symbol: sleep
warning: undefined symbol: memcmp
warning: undefined symbol: pthread_self
shared:ERROR: Closure compiler run failed:

shared:ERROR: ERROR - [JSC_READ_ERROR] Cannot read file C:/Users/jukkaj/AppData/Local/Temp/emscripten_temp_i2mkkx71/a.out.wasm.o.js: Failed to read: C:\Users\jukkaj\AppData\Local\Temp\emscripten_temp_i2mkkx71\a.out.wasm.o.js, is this input UTF-8 encoded?

1 error(s), 0 warning(s)

shared:ERROR: closure compiler failed (rc: 1.)

See also google/closure-compiler#3556

Commenting out the line kw.setdefault('encoding', 'utf-8') lets the above command line pass successfully.

@sbc100
Copy link
Collaborator Author

sbc100 commented Feb 25, 2020

Hmm, sadly I don't have a windows box right now to try this on, and I also fear that its would depend on the local of the windows box.

It looks to me like closure is expecting system-default-encoded files, rather than UTF-8. I imagine that running closure with encoding=None (on python3-only) would solve the problem. Can you verify?

@sbc100
Copy link
Collaborator Author

sbc100 commented Feb 25, 2020

Oh wait.. its not the stdin/stdout of closure but a file on disk... it seems like maybe the stdout of a subprocess was connected to a file file? I'm not sure what compiler stages do this. Is this a fastcomp failure? Fastcomp-only?

@sbc100
Copy link
Collaborator Author

sbc100 commented Feb 28, 2020

I'm trying to repro this issue but failing.

What does sys.getdefaultencoding() report on your system?

@sbc100
Copy link
Collaborator Author

sbc100 commented Feb 28, 2020

Also what does locale.getpreferredencoding() report? I'm trying to repro this on my windows machine.

@kripken
Copy link
Member

kripken commented Mar 3, 2020

This is blocking us from tagging a new release, but I'm not sure what we should do here: it seems like we have possible breakage either without this PR or with it.

Perhaps the old breakage is less risky, as it may have existed for a long time before discovery, while this breakage was found promptly (and so may be found more times by others)?

@sbc100
Copy link
Collaborator Author

sbc100 commented Mar 3, 2020

I've tried to reproduce this on windows and failed. I'm happy to revert this but I honestly don't know if #10551 is likely to be common.

We did make the switch to python3 two weeks ago now so I would have though more windows users would have reported #10551 if it was common. So yes I guess we revert this since and cut a release, since #10551 seems to have been caused by emsdk and will be happening regardless of whether we release a new emscripten version.

kripken added a commit that referenced this pull request Mar 4, 2020
kripken added a commit that referenced this pull request Mar 4, 2020
…n. (#10558)" (#10639)

Reverts #10558

That PR broke closure on windows as reported there. We did
not manage to reproduce that failure, but also did not manage
to reproduce the failure that led to that PR in the first place, so
the safe thing is to revert that change and investigate more.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Build Failure with UnicodeDecodeError
3 participants