Skip to content

gh-136599: Improve long_hash #136600

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions Lib/test/test_long.py
Original file line number Diff line number Diff line change
Expand Up @@ -1693,5 +1693,22 @@ class MyInt(int):
# GH-117195 -- This shouldn't crash
object.__sizeof__(1)

def test_hash(self):
# gh-136599
self.assertEqual(hash(-1), -2)
self.assertEqual(hash(0), 0)
self.assertEqual(hash(10), 10)

self.assertEqual(hash(sys.hash_info.modulus - 2), sys.hash_info.modulus - 2)
self.assertEqual(hash(sys.hash_info.modulus - 1), sys.hash_info.modulus - 1)
self.assertEqual(hash(sys.hash_info.modulus), 0)
self.assertEqual(hash(sys.hash_info.modulus + 1), 1)

self.assertEqual(hash(-sys.hash_info.modulus - 2), -2)
self.assertEqual(hash(-sys.hash_info.modulus - 1), -2)
self.assertEqual(hash(-sys.hash_info.modulus), 0)
self.assertEqual(hash(-sys.hash_info.modulus + 1), - (sys.hash_info.modulus - 1))


if __name__ == "__main__":
unittest.main()
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Improve performance of :class:`int` hash calculations.
15 changes: 14 additions & 1 deletion Objects/longobject.c
Original file line number Diff line number Diff line change
Expand Up @@ -3676,7 +3676,20 @@ long_hash(PyObject *obj)
}
i = _PyLong_DigitCount(v);
sign = _PyLong_NonCompactSign(v);
x = 0;

// unroll first two digits
#if ( PyHASH_BITS > PyLong_SHIFT )
assert(i>=2);
--i;
x = v->long_value.ob_digit[i];
assert(x < _PyHASH_MODULUS);
#endif
#if ( PyHASH_BITS > (2 * PyLong_SHIFT) )
--i;
x = ((x << PyLong_SHIFT));
x += v->long_value.ob_digit[i];
assert(x < _PyHASH_MODULUS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(x < _PyHASH_MODULUS);
if (x >= _PyHASH_MODULUS)
x -= _PyHASH_MODULUS;

This assert will trigger for values like hash(2**31) for 32bit targets which use the default #define PYLONG_BITS_IN_DIGIT 30

cpython/Include/pyport.h

Lines 132 to 138 in 0d4fd10

/* PYLONG_BITS_IN_DIGIT describes the number of bits per "digit" (limb) in the
* PyLongObject implementation (longintrepr.h). It's currently either 30 or 15,
* defaulting to 30. The 15-bit digit option may be removed in the future.
*/
#ifndef PYLONG_BITS_IN_DIGIT
#define PYLONG_BITS_IN_DIGIT 30
#endif

because
typedef Py_ssize_t Py_hash_t;

and thus PyHASH_BITS is 31
#if SIZEOF_VOID_P >= 8
# define PyHASH_BITS 61
#else
# define PyHASH_BITS 31
#endif
#define PyHASH_MODULUS (((size_t)1 << _PyHASH_BITS) - 1)

I suggest to add more tests to test_long_hash around these corner cases for such build configurations:

>>> hash(-2**31 - 2)
-3
>>> hash(-2**31 - 1)
-2
>>> hash(-2**31)
-2
>>> hash(2**31)
1
>>> hash(2**31 + 1)
2
>>> hash(2**31 + 2)
3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a check that hash(-2**31) is not -1. For the others I has a bit hesitant as the values are implementation details (document as implementation details here: https://docs.python.org/3/library/stdtypes.html#hashing-of-numeric-types). To add tests I would have to get the (private) value of PyHASH_MODULUS.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, your link also mentions that PyHASH_MODULUS is available via https://docs.python.org/3/library/sys.html#sys.hash_info.modulus, which is e.g. used in

_PyHASH_MODULUS = sys.hash_info.modulus

and some other tests. But yeah, seems to be an implementation detail. Don't know for sure whether it's ok to use it in tests - but I think so?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyHASH_MODULUS is not private anymore. And it was documented in the algorithm description before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "implementation details" are fully documented and PyHASH_MODULUS is indeed public. So I agree we need tests for the corner cases. I rewrote the tests to use sys.hash_info.modulus.

In test_numeric_tower.py there are more tests for the hash of ints, but these tests added here in this PR are not included there directly (only indirectly, e.g. by comparing the value of hash(n) with hash(float(n)).

#endif
while (--i >= 0) {
/* Here x is a quantity in the range [0, _PyHASH_MODULUS); we
want to compute x * 2**PyLong_SHIFT + v->long_value.ob_digit[i] modulo
Expand Down
Loading