Skip to content

BUG: Passing dataframe to a function creates an inplace update to the original dataframe #35859

Closed
@trenton3983

Description

@trenton3983
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd
from typing import Tuple

# sample data
data = {'Date': [pd.Timestamp('2019-01-31 00:00:00'), pd.Timestamp('2019-02-01 00:00:00'), pd.Timestamp('2019-02-04 00:00:00'), pd.Timestamp('2019-02-05 00:00:00'), pd.Timestamp('2019-02-06 00:00:00'), pd.Timestamp('2019-02-07 00:00:00'), pd.Timestamp('2019-02-08 00:00:00'), pd.Timestamp('2019-02-11 00:00:00'), pd.Timestamp('2019-02-12 00:00:00'), pd.Timestamp('2019-02-13 00:00:00'), pd.Timestamp('2019-02-14 00:00:00')],
        'Close': [166.44000244140625,  166.52000427246094,  171.25,  174.17999267578125,  174.24000549316406,  170.94000244140625,  170.41000366210938,  169.42999267578125,  170.88999938964844,  170.17999267578125,  170.8000030517578]}

# create dataframe
aapl = pd.DataFrame(data)

def find_trend(data: pd.DataFrame, period: int) -> Tuple[pd.Series, pd.Series]:

    data['sma'] = data['Close'].rolling(period).mean()  # this creates an inplace update to aapl
    diff = data['sma'] - data['sma'].shift(1)  # calculates a series of values
    greater_than_0 = diff > 0  # creates a series of bools
    return diff, greater_than_0


aapl['value'], aapl['trend'] = find_trend(aapl, 4)

Current Output

  • Note the creation of the sma column
  • Is this the expected behavior?
|    | Date                |   Close |     sma |       value | trend   |
|---:|:--------------------|--------:|--------:|------------:|:--------|
|  0 | 2019-01-31 00:00:00 |  166.44 | nan     | nan         | False   |
|  1 | 2019-02-01 00:00:00 |  166.52 | nan     | nan         | False   |
|  2 | 2019-02-04 00:00:00 |  171.25 | nan     | nan         | False   |
|  3 | 2019-02-05 00:00:00 |  174.18 | 169.597 | nan         | False   |
|  4 | 2019-02-06 00:00:00 |  174.24 | 171.548 |   1.95      | True    |
|  5 | 2019-02-07 00:00:00 |  170.94 | 172.653 |   1.105     | True    |
|  6 | 2019-02-08 00:00:00 |  170.41 | 172.443 |  -0.209999  | False   |
|  7 | 2019-02-11 00:00:00 |  169.43 | 171.255 |  -1.1875    | False   |
|  8 | 2019-02-12 00:00:00 |  170.89 | 170.417 |  -0.837502  | False   |
|  9 | 2019-02-13 00:00:00 |  170.18 | 170.227 |  -0.190002  | False   |
| 10 | 2019-02-14 00:00:00 |  170.8  | 170.325 |   0.0974998 | True    |

Problem description

Expected Output

|    | Date                |   Close |       value | trend   |
|---:|:--------------------|--------:|------------:|:--------|
|  0 | 2019-01-31 00:00:00 |  166.44 | nan         | False   |
|  1 | 2019-02-01 00:00:00 |  166.52 | nan         | False   |
|  2 | 2019-02-04 00:00:00 |  171.25 | nan         | False   |
|  3 | 2019-02-05 00:00:00 |  174.18 | nan         | False   |
|  4 | 2019-02-06 00:00:00 |  174.24 |   1.95      | True    |
|  5 | 2019-02-07 00:00:00 |  170.94 |   1.105     | True    |
|  6 | 2019-02-08 00:00:00 |  170.41 |  -0.209999  | False   |
|  7 | 2019-02-11 00:00:00 |  169.43 |  -1.1875    | False   |
|  8 | 2019-02-12 00:00:00 |  170.89 |  -0.837502  | False   |
|  9 | 2019-02-13 00:00:00 |  170.18 |  -0.190002  | False   |
| 10 | 2019-02-14 00:00:00 |  170.8  |   0.0974998 | True    |

Resolves Issue

  • I know changing the function, as follows, will result in not creating an inplace update to aapl
def find_trend(data: pd.DataFrame, period: int) -> Tuple[pd.Series, pd.Series]:

    sma = data['Close'].rolling(period).mean()  # does not create an inplace update to aapl
    diff = sma - sma.shift(1)  # calculates a series of values
    greater_than_0 = diff > 0  # creates a series of bools
    return diff, greater_than_0

Output of pd.show_versions()

INSTALLED VERSIONS

commit : d9fff27
python : 3.8.5.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252

pandas : 1.1.0
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.2
setuptools : 49.6.0.post20200814
Cython : 0.29.21
pytest : 6.0.1
hypothesis : None
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: 0.9.0
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.8.0
fastparquet : None
gcsfs : None
matplotlib : 3.3.1
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.0
sqlalchemy : 1.3.18
tables : 3.6.1
tabulate : 0.8.7
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.50.1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions