Skip to content

crc32 function outputs wrong result for large data on the macOS arm64 platform #105967

@maikschulze

Description

@maikschulze

Bug report

The functions zlib.crc32 and binascii.crc32 share the problematic behavior. When computing the CRC for data >= 2GB macOS arm64 binaries result in different values than all other platforms such as macOS x64, Windows x64, Linux x64. Consequently, problems arise e.g. when using the zipfile module.

A clear and concise description of what the bug is.

Reproduction:

import random
random.seed(0)

import zlib
import binascii

def chunks(list, n):
  for i in range(0, len(list), n):
      yield list[i:i + n]

random_megabyte = random.randbytes(1024*1024)
random_1_gigabyte = random_megabyte * 1024 * 1
random_4_gigabyte = random_megabyte * 1024 * 4

crc_1_gigabyte_zlib = zlib.crc32(random_1_gigabyte, 0)
crc_1_gigabyte_binascii = binascii.crc32(random_1_gigabyte, 0)

crc_4_gigabyte_zlib = zlib.crc32(random_4_gigabyte, 0)
crc_4_gigabyte_binascii = binascii.crc32(random_4_gigabyte, 0)

# incremental computation in chunks < 2 GB fixes macOS arm64
chunked_crc_4_gigabyte_zlib = 0
chunked_crc_4_gigabyte_binascii = 0
for chunk in chunks(random_4_gigabyte, 1024 * 1024 * 1024 *1):
  chunked_crc_4_gigabyte_zlib = zlib.crc32(chunk, chunked_crc_4_gigabyte_zlib)
  chunked_crc_4_gigabyte_binascii = binascii.crc32(chunk, chunked_crc_4_gigabyte_binascii)

print("crc_1_gigabyte_zlib".ljust(32),             "expected: 0xe28bc234 computed:", hex(crc_1_gigabyte_zlib))
print("crc_1_gigabyte_binascii".ljust(32),         "expected: 0xe28bc234 computed:", hex(crc_1_gigabyte_binascii))
print("crc_4_gigabyte_zlib".ljust(32),             "expected: 0x278432d6 computed:", hex(crc_4_gigabyte_zlib))
print("crc_4_gigabyte_binascii".ljust(32),         "expected: 0x278432d6 computed:", hex(crc_4_gigabyte_binascii))
print("chunked_crc_4_gigabyte_zlib".ljust(32),     "expected: 0x278432d6 computed:", hex(chunked_crc_4_gigabyte_zlib))
print("chunked_crc_4_gigabyte_binascii".ljust(32), "expected: 0x278432d6 computed:", hex(chunked_crc_4_gigabyte_binascii))

Output on macOS arm64:

mac-arm64:crc_bug dev_admin$ /opt/homebrew/bin/python3 crc_bug_report.py 
crc_1_gigabyte_zlib              expected: 0xe28bc234 computed: 0xe28bc234
crc_1_gigabyte_binascii          expected: 0xe28bc234 computed: 0xe28bc234
crc_4_gigabyte_zlib              expected: 0x278432d6 computed: 0x6b54c6be
crc_4_gigabyte_binascii          expected: 0x278432d6 computed: 0x6b54c6be
chunked_crc_4_gigabyte_zlib      expected: 0x278432d6 computed: 0x278432d6
chunked_crc_4_gigabyte_binascii  expected: 0x278432d6 computed: 0x278432d6

Your environment

  • CPython versions tested on: Python 3.9.6 Python 3.11.4
  • Operating system and architecture: macOS arm64, macOS x64, Windows x64, Linux x64

Linked PRs

Metadata

Metadata

Assignees

Labels

OS-macstdlibPython modules in the Lib dirtype-bugAn unexpected behavior, bug, or error

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions