diff --git a/doc/source/user_guide/copy_on_write.rst b/doc/source/user_guide/copy_on_write.rst index 94dde9a6ffd70..d16b41decd57a 100644 --- a/doc/source/user_guide/copy_on_write.rst +++ b/doc/source/user_guide/copy_on_write.rst @@ -6,11 +6,6 @@ Copy-on-Write (CoW) ******************* -.. ipython:: python - :suppress: - - pd.options.mode.copy_on_write = True - Copy-on-Write was first introduced in version 1.5.0. Starting from version 2.0 most of the optimizations that become possible through CoW are implemented and supported. A complete list can be found at :ref:`Copy-on-Write optimizations `. @@ -21,6 +16,36 @@ CoW will lead to more predictable behavior since it is not possible to update mo one object with one statement, e.g. indexing operations or methods won't have side-effects. Additionally, through delaying copies as long as possible, the average performance and memory usage will improve. +Previous behavior +----------------- + +pandas indexing behavior is tricky to understand. Some operations return views while +other return copies. Depending on the result of the operation, mutation one object +might accidentally mutate another: + +.. ipython:: python + + df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]}) + subset = df["foo"] + subset.iloc[0] = 100 + df + +Mutating ``subset``, e.g. updating its values, also updates ``df``. The exact behavior is +hard to predict. Copy-on-Write solves accidentally modifying more than one object, +it explicitly disallows this. With CoW enabled, ``df`` is unchanged: + +.. ipython:: python + + pd.options.mode.copy_on_write = True + + df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]}) + subset = df["foo"] + subset.iloc[0] = 100 + df + +The following sections will explain what this means and how it impacts existing +applications. + Description -----------