-
-
Notifications
You must be signed in to change notification settings - Fork 32.4k
Description
Feature or enhancement
Proposal:
Code reading data in pure python tends to make a buffer variable, call os.read()
which returns a separate newly allocated buffer of data, then copy/append that data onto the pre-allocated buffer[0]. That creates unnecessary extra buffer objects, as well as unnecessary copies. Provide os.readinto
for directly filling a Buffer Protocol object.
os.readinto
should closely mirror _Py_read
which underlies os.read in order to get the same behaviors around retries as well as well-tested cross-platform support.
Move simple cases that use os.read (ex. [0]) to use the new API when it makes code simpler and more efficient. Potentially adding readinto
to more readable/writeable file-like proxy objects or objects which transform the data (ex. Lib/_compression
) is out of scope for this issue.
[0]
Lines 1914 to 1921 in 298dda5
# Wait for exec to fail or succeed; possibly raising an | |
# exception (limited in size) | |
errpipe_data = bytearray() | |
while True: | |
part = os.read(errpipe_read, 50000) | |
errpipe_data += part | |
if not part or len(errpipe_data) > 50000: | |
break |
cpython/Lib/multiprocessing/forkserver.py
Lines 384 to 392 in 298dda5
def read_signed(fd): | |
data = b'' | |
length = SIGNED_STRUCT.size | |
while len(data) < length: | |
s = os.read(fd, length - len(data)) | |
if not s: | |
raise EOFError('unexpected EOF') | |
data += s | |
return SIGNED_STRUCT.unpack(data)[0] |
Lines 1695 to 1701 in 298dda5
def readinto(self, b): | |
"""Same as RawIOBase.readinto().""" | |
m = memoryview(b).cast('B') | |
data = self.read(len(m)) | |
n = len(data) | |
m[:n] = data | |
return n |
os.read
loops to migrate
Well contained os.read
loops
-
multiprocessing.forkserver read_signed
- @cmaloney - gh-129205: Update multiprocessing.forkserver to use os.readinto #129425 [x]subprocess Popen._execute_child
- @cmaloney - gh-129205: Use os.readinto() in subprocess errpipe_read #129498
os.read
loop interleaved with other code
-
_pyio FileIO.read FileIO.readall FileIO.readinto
see, Reduce copies when reading files in pyio, match behavior of _io #129005 -- @cmaloney -
_pyrepl.unix_console UnixConsole.input_buffer
-- fixed length underlying buffer with "pos" / window on top. -
pty _copy
. Operates around a "high waterlevel" / attempt to have a fixed-ish size buffer. Wrapsos.read
with a_read
function. -
subprocess Popen.communicate
. Note, this feels like something non-contiguous Py_buffer would be really good for, particularly inself.text_mode
where currently all the bytes are "copied" into a contiguousbytes
to turn then turn into text... -
tarfile _Stream._read and _Stream.__read
. Note, builds _LowLevelFile aroundos.read
, but other read methods also available.
Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
Linked PRs
- gh-129205: Add os.readinto API for reading data into a caller provided buffer #129211
- gh-129205: Modernize test_eintr #129316
- gh-129205: Update multiprocessing.forkserver to use os.readinto #129425
- gh-129205: Use os.readinto() in subprocess errpipe_read #129498
- gh-129205: Experiment BytesIO._readfrom() #130098