Skip to content

Hazard with ReadBytesExt methods and continuing after errors #208

@eric-seppanen

Description

@eric-seppanen

I don't think this is a bug in byteorder. But I thought it demonstrates an interesting hazard that might be worth documenting for others.

Some byteorder::ReadBytesExt methods rely on std::io::Read::read_exact, which has a warning about the error case:

If this function encounters an “end of file” before completely filling the buffer, it returns an error of the kind ErrorKind::UnexpectedEof. The contents of buf are unspecified in this case.

... If this function returns an error, it is unspecified how many bytes it has read, but it will never read more than would be necessary to completely fill the buffer.

This means that e.g. byteorder::read_u16 has the same hazard: if it encounters the end of the reader, the reader is now in an unspecified state.

So code like this is wrong:

pub fn parse_packet(packet: &[u8]) -> Vec<u16> {
    let mut result = Vec::new();
    let mut reader = Cursor::new(packet);

    // Read as many u16s as we can
    while let Ok(word) = reader.read_u16::<LittleEndian>() {
        result.push(word);
    }
    // If there was a byte left over, take that too.
    if let Ok(odd_end_val) = reader.read_u8() {
        result.push(odd_end_val.into())
    }
    result
}

I recently stumbled over code that did exactly this, because the standard library changed its (unspecified) behavior. In older versions, the function above behaved as expected (remainder bytes stayed in the reader after the error). In rust 1.80.0, the remainder bytes are consumed. The following unit test demonstrates (comment out the assert for your version):

#[test]
fn test_odd() {
    let input = [1, 2, 3];
    let output = parse_packet(&input);

    // Rust <= 1.79
    assert_eq!(output, [0x0201, 0x0003]);

    // Rust 1.80
    assert_eq!(output, [0x0201]);
}

The byteorder::ReadBytesExt methods are documented to track the read_exact error behavior:

Errors

This method returns the same errors as Read::read_exact.

But maybe more should be done? I'm not sure how often this pattern might exist in the wild. Maybe the docs should be extended to warn people that the input stream can't be relied on after an error is returned?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions