PERF: lexsort_depth #47511

MarcoGorelli · 2022-06-26T13:50:39Z

closes #xxxx (Replace xxxx with the Github issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Currently, to find the lexsort_depth, is_lexsorted is repeatedly called.

This is unnecessary, it should be possible to keep track of the lexsort depth whilst looping through the elements. In particular, if the first level isn't lexsorted, and there are many levels, the repeatedly calling is_lexsorted is slow

Timing example:

import pstats, cProfile

import pandas as pd
import numpy as np

n = 200
df = pd.DataFrame(np.ones((3, n)))
df.loc[1, 1] = 2
df = df.set_index(df.columns.tolist()).copy()

def main():
    df.index._lexsort_depth


cProfile.runctx("main()", globals(), locals(), "Profile.prof")

s = pstats.Stats("Profile.prof")
s.strip_dirs().sort_stats('time').print_stats()

On main:

         221516 function calls in 0.647 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    20100    0.155    0.000    0.562    0.000 _dtype.py:328(_name_get)
    20100    0.102    0.000    0.264    0.000 numerictypes.py:356(issubdtype)
    40200    0.097    0.000    0.142    0.000 numerictypes.py:282(issubclass_)
    80400    0.087    0.000    0.087    0.000 {built-in method builtins.issubclass}
      200    0.083    0.000    0.644    0.003 {pandas._libs.algos.is_lexsorted}
    20100    0.063    0.000    0.327    0.000 _dtype.py:314(_name_includes_bit_suffix)
    20100    0.033    0.000    0.033    0.000 {method 'format' of 'str' objects}
    20100    0.026    0.000    0.026    0.000 _dtype.py:24(_kind_name)
        1    0.001    0.001    0.647    0.647 multi.py:3841(_lexsort_depth)
      200    0.001    0.000    0.001    0.000 {pandas._libs.algos.ensure_int64}

On this branch:

         2417 function calls in 0.008 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      200    0.002    0.000    0.006    0.000 _dtype.py:328(_name_get)
      200    0.001    0.000    0.003    0.000 numerictypes.py:356(issubdtype)
      400    0.001    0.000    0.001    0.000 numerictypes.py:282(issubclass_)
        1    0.001    0.001    0.007    0.007 {pandas._libs.algos.lexsort_depth}
      800    0.001    0.000    0.001    0.000 {built-in method builtins.issubclass}
      200    0.001    0.000    0.001    0.000 {pandas._libs.algos.ensure_int64}
      200    0.001    0.000    0.003    0.000 _dtype.py:314(_name_includes_bit_suffix)
      200    0.000    0.000    0.000    0.000 {method 'format' of 'str' objects}
        1    0.000    0.000    0.001    0.001 multi.py:3843(<listcomp>)
      200    0.000    0.000    0.000    0.000 _dtype.py:24(_kind_name)

Note that if the index's codes are already lexsorted (e.g. by commenting out df.loc[1, 1] = 2), this doesn't make a difference (0.008 seconds in both cases)

I think there's alraedy a whatsnew note that could cover this, but I'll check again later

EDIT might be better to use this instead of is_lexsorted to avoid duplication, will check this soon

jbrockmendel · 2022-06-26T23:52:43Z

pandas/_libs/algos.pyx

+        assert arr.dtype.name == 'int64'
+        vecs[i] = <int64_t*>cnp.PyArray_DATA(arr)
+
+    # Assume uniqueness??


is this a comment meaning "check if we should assume uniqueness" or more along the lines of "TODO: should we assume uniqueness?" if the latter, can you make that explicit

I copied this from is_lexsorted, but it's a comment which was originally added in 49675be back in 2011. I think it's intended as a todo (i.e. if some levels are duplicated, then maybe it's more efficient to not check each one separately), but should be fine to remove

jbrockmendel · 2022-06-26T23:53:30Z

pandas/_libs/algos.pyx

+                    # No need to check levels for which we know input isn't lexsorted.
+                    break
+                cur = vecs[k][i]
+                pre = vecs[k][i -1]


whitespace around -

jbrockmendel · 2022-06-26T23:55:04Z

pandas/core/indexes/multi.py

-    for k in range(nlevels, 0, -1):
-        if libalgos.is_lexsorted(int64_codes[:k]):
-            return k
-    return 0


does the perf boost come from moving the loop into cython or from using a different algorithm?

different algorithm - the main one first checks whether all levels are lexsorted, then checks whether the first
n_levels-1 are lexsorted, then checks whether the first n_levels-2 are lexsorted, etc. So it requires going
through the whole array many times. The algo in this branch just needs to go through the array once, keeping
track of the depth along the way, and exiting early if possible.

jbrockmendel · 2022-06-26T23:55:58Z

Do we still need is_lexsorted?

MarcoGorelli · 2022-06-27T15:06:39Z

Do we still need is_lexsorted?

is_lexsorted would still be more efficient if all we're checking is whether something is lexsorted, as opposed to
what the depth is
example:

In [11]: list_of_arrays = [np.ones((100_000,), dtype='int64') for _ in range(2)]

In [12]: list_of_arrays[1][1] = 2

In [13]: %%timeit
    ...: lexsort_depth(list_of_arrays)
    ...: 
    ...: 
655 µs ± 17.1 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [14]: %%timeit
    ...: is_lexsorted(list_of_arrays)
    ...: 
    ...: 
15.2 µs ± 17 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

jreback · 2022-07-03T15:03:42Z

does this help with any algos generally? e.g. try some indexing / multiindex benchs and see if any change

MarcoGorelli · 2022-07-10T09:22:38Z

It makes a noticeable difference if there's a wide multiindex, the rest doesn't change much. ASV runs (with a wide multiindex benchmark added):

asv continuous -f 1.1 upstream/main 39d90f729b7b3b94bedccb2b1930b6d50f5d6827 -b ^multiindex_object

run 1:

       before           after         ratio
     [e915b0a4]       [39d90f72]
     <main>           <perf-lexsort-depth>
-        49.2±1μs         26.2±1μs     0.53  multiindex_object.GetLoc.time_wide_get_loc

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

run 2:

       before           after         ratio
     [e915b0a4]       [39d90f72]
     <main>           <perf-lexsort-depth>
-        52.0±1μs       26.9±0.2μs     0.52  multiindex_object.GetLoc.time_wide_get_loc

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

run 3:

       before           after         ratio
     [e915b0a4]       [39d90f72]
     <main>           <perf-lexsort-depth>
-      50.6±0.7μs       27.2±0.7μs     0.54  multiindex_object.GetLoc.time_wide_get_loc

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

Reckon it's worth it, or is it too little benefit to justify adding code?

MarcoGorelli · 2022-07-10T10:06:23Z

Reckon it's worth it, or is it too little benefit to justify adding code?

TBH I think this is too little benefit - #47459 already improved is_lexsorted, and MultiIndexes don't typically have many levels, so this isn't worth adding extra code to have to maintain

Apologies for having taken up your time with this

perf: lexsort_depth

294a997

MarcoGorelli marked this pull request as ready for review June 26, 2022 18:12

MarcoGorelli requested a review from jbrockmendel June 26, 2022 18:12

MarcoGorelli added the Performance Memory or execution speed performance label Jun 26, 2022

jbrockmendel reviewed Jun 26, 2022

View reviewed changes

remove uniqueness comment and extra whitespace

f945f12

jreback added this to the 1.5 milestone Jul 3, 2022

MarcoGorelli marked this pull request as draft July 8, 2022 07:19

MarcoGorelli added 8 commits July 9, 2022 10:00

Merge remote-tracking branch 'upstream/main' into perf-lexsort-depth

9b92b74

add mi_wide asv benchmark

b7760e8

better types

a706d9b

use while loop

3d6c562

another while loop

b2be554

just check if depth

3f1bcaf

use != 0

0b33bff

revert

39d90f7

MarcoGorelli marked this pull request as ready for review July 10, 2022 09:23

MarcoGorelli closed this Jul 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF: lexsort_depth #47511

PERF: lexsort_depth #47511

Uh oh!

MarcoGorelli commented Jun 26, 2022 •

edited

Loading

Uh oh!

jbrockmendel Jun 26, 2022

Uh oh!

MarcoGorelli Jun 27, 2022

Uh oh!

jbrockmendel Jun 26, 2022

Uh oh!

jbrockmendel Jun 26, 2022

Uh oh!

MarcoGorelli Jun 27, 2022

Uh oh!

jbrockmendel commented Jun 26, 2022

Uh oh!

MarcoGorelli commented Jun 27, 2022

Uh oh!

jreback commented Jul 3, 2022

Uh oh!

MarcoGorelli commented Jul 10, 2022

Uh oh!

MarcoGorelli commented Jul 10, 2022

Uh oh!

Uh oh!

Uh oh!

PERF: lexsort_depth #47511

PERF: lexsort_depth #47511

Uh oh!

Conversation

MarcoGorelli commented Jun 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbrockmendel Jun 26, 2022

Choose a reason for hiding this comment

Uh oh!

MarcoGorelli Jun 27, 2022

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Jun 26, 2022

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Jun 26, 2022

Choose a reason for hiding this comment

Uh oh!

MarcoGorelli Jun 27, 2022

Choose a reason for hiding this comment

Uh oh!

jbrockmendel commented Jun 26, 2022

Uh oh!

MarcoGorelli commented Jun 27, 2022

Uh oh!

jreback commented Jul 3, 2022

Uh oh!

MarcoGorelli commented Jul 10, 2022

Uh oh!

MarcoGorelli commented Jul 10, 2022

Uh oh!

Uh oh!

MarcoGorelli commented Jun 26, 2022 •

edited

Loading