From 65d9cf61165c168f06f9209c7c7f93d52d9858ab Mon Sep 17 00:00:00 2001 From: Patrick Hoefler Date: Sat, 4 Mar 2023 21:24:55 +0100 Subject: [PATCH 1/2] Add section about copy on write and mutating objects --- doc/source/user_guide/copy_on_write.rst | 35 +++++++++++++++++++++---- 1 file changed, 30 insertions(+), 5 deletions(-) diff --git a/doc/source/user_guide/copy_on_write.rst b/doc/source/user_guide/copy_on_write.rst index 94dde9a6ffd70..0d52393c7dbaa 100644 --- a/doc/source/user_guide/copy_on_write.rst +++ b/doc/source/user_guide/copy_on_write.rst @@ -6,11 +6,6 @@ Copy-on-Write (CoW) ******************* -.. ipython:: python - :suppress: - - pd.options.mode.copy_on_write = True - Copy-on-Write was first introduced in version 1.5.0. Starting from version 2.0 most of the optimizations that become possible through CoW are implemented and supported. A complete list can be found at :ref:`Copy-on-Write optimizations `. @@ -21,6 +16,36 @@ CoW will lead to more predictable behavior since it is not possible to update mo one object with one statement, e.g. indexing operations or methods won't have side-effects. Additionally, through delaying copies as long as possible, the average performance and memory usage will improve. +Previous behavior +----------------- + +pandas indexing behavior is tricky to understand. Some operations return views while +other return copies. Depending on the result of the operation, mutation one object +might accidentally mutate another: + +.. ipython:: python + + df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]}) + subset = df["foo"] + subset.iloc[0] = 100 + df + +Mutating subset, e.g. updating it's values, also updates df. The exact behavior is +hard to predict. Copy-on-Write solves accidentally modifying more than one object, +it explicitly disallows this. With CoW enabled, ``df`` is unchanged: + +.. ipython:: python + + pd.options.mode.copy_on_write = True + + df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]}) + subset = df["foo"] + subset.iloc[0] = 100 + df + +The following sections will explain what this means and how it impacts existing +applications. + Description ----------- From a8c1e724badf068162ba7b7123758059c348db95 Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Wed, 15 Mar 2023 21:16:08 +0100 Subject: [PATCH 2/2] Update doc/source/user_guide/copy_on_write.rst --- doc/source/user_guide/copy_on_write.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/user_guide/copy_on_write.rst b/doc/source/user_guide/copy_on_write.rst index 0d52393c7dbaa..d16b41decd57a 100644 --- a/doc/source/user_guide/copy_on_write.rst +++ b/doc/source/user_guide/copy_on_write.rst @@ -30,7 +30,7 @@ might accidentally mutate another: subset.iloc[0] = 100 df -Mutating subset, e.g. updating it's values, also updates df. The exact behavior is +Mutating ``subset``, e.g. updating its values, also updates ``df``. The exact behavior is hard to predict. Copy-on-Write solves accidentally modifying more than one object, it explicitly disallows this. With CoW enabled, ``df`` is unchanged: