-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
Error ReportingIncorrect or improved errors from pandasIncorrect or improved errors from pandasIndexingRelated to indexing on series/frames, not to indexes themselvesRelated to indexing on series/frames, not to indexes themselvesPerformanceMemory or execution speed performanceMemory or execution speed performance
Description
Code Sample, a copy-pastable example if possible
>>> dates = pd.date_range('2011-1-1', periods=500000, freq='min')
>>> index = np.random.choice(dates, 500000, replace=True)
>>> df = pd.DataFrame(index=index, data={'a':1})
>>> df_sort = df.sort_index()
>>> %timeit df['2011-6-11']
1.19 ms ± 77.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> # Sorted is three times faster
>>> %timeit df_sort['2011-6-11']
333 µs ± 17 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> # Now with .loc
>>> %timeit df.loc['2011-6-11']
2.59 ms ± 238 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> # 2500x slower
>>> %timeit df_sort.loc['2011-6-11']
853 ms ± 29.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> # Slicing works fine
>>> %timeit df.loc['2011-6-11':'2011-10-1']
52 ms ± 2.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit df_sort.loc['2011-6-11':'2011-10-1']
658 µs ± 35.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Problem description
When using the .loc
indexer on a large frame with sorted datetimeindex, selection is ~2500 times slower than just the indexing operator itself. It's also ~300 times slower than the unsorted .loc lookup.
Slicing appears to work as expected
Expected Output
Sorted frame should be faster when using .loc
nathanielatom
Metadata
Metadata
Assignees
Labels
Error ReportingIncorrect or improved errors from pandasIncorrect or improved errors from pandasIndexingRelated to indexing on series/frames, not to indexes themselvesRelated to indexing on series/frames, not to indexes themselvesPerformanceMemory or execution speed performanceMemory or execution speed performance