-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Open
Labels
Numeric OperationsArithmetic, Comparison, and Logical operationsArithmetic, Comparison, and Logical operationsPerformanceMemory or execution speed performanceMemory or execution speed performance
Description
In places like equals
methods and array_equivalent
, we do things like (left == right).all()
or ((left == right) | (isna(left) & isna(right))).all()
. For large arrays that are not equal, we can do much better with something like:
def all_match(left, right) -> bool:
if left.dtype.kind != "i":
# viewing as i8 will make NaNs be treated as equal
return _all_match_i8(left.view("i8"), right.view("i8"))
return _all_match_i8(left, right)
cdef bint _all_match_i8(const int64_t[:] left, const int64_t[:] right):
cdef:
Py_ssize_t i, n = len(left)
for i in range(n):
if left[i] != right[i]:
return False
return True
Some profiling results:
In [2]: arr = np.arange(10**6)
In [3]: arr2 = arr.copy()
In [4]: arr2[0] = -1
In [5]: %timeit np.array_equal(arr, arr2)
831 µs ± 42.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [6]: %timeit all_match(arr, arr2)
1.27 µs ± 58.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [7]: %timeit np.array_equal(arr, arr)
416 µs ± 16.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [8]: %timeit all_match(arr, arr)
812 µs ± 5.84 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
So in cases that short circuit early, we can get massive speedups, but this implementation is actually 2x slower in cases that dont short-circuit (for reasons that are not clear to me).
Metadata
Metadata
Assignees
Labels
Numeric OperationsArithmetic, Comparison, and Logical operationsArithmetic, Comparison, and Logical operationsPerformanceMemory or execution speed performanceMemory or execution speed performance