Description
When adding a column to a DataFrame with one level having a DateTime-like dtype, the dtype of the values to be added is explicitly casted to object type in multi.py if the indexes of the values to be setted and the frames index are not identical in pandas version 1.0.3. Those object typed values are beiing transformed to Timestamps later on. This consumes a lot of time for big dataframes.
Comparing Pandas version 0.22.0 and 1.0.3 yields 0.124 seconds vs. 35.274 seconds on my machine on following reproducable setup:
build reproducable setup
iterables = [range(10000), pd.date_range('2020-01-01', periods=200)]
idx = pd.MultiIndex.from_product(iterables, names=['id', 'date'])
df = pd.DataFrame(data=np.random.randn(10000 * 200), index=idx, columns=["value"])
new_col = df[df.index.get_level_values(1) != pd.to_datetime('2020-01-01')] # drop first record of each id
print(df.shape, new_col.shape)
profile performance of set_item
import cProfile
pr = cProfile.Profile()
pr.enable()
df['new_col'] = new_col['value']
pr.disable()
pr.print_stats(sort=2)