-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
from datetime import datetime, UTC
import pandas as pd
start = datetime(2024, 1, 1)
end = datetime(2025, 1, 1)
samples = 10
df = pd.DataFrame([
[ 2024, 1, 7, 11, 42, 13],
[ 2024, 9, 19, 11, 54, 20],
[ 2024, 9, 17, 1, 22, 0],
[ 2024, 1, 24, 21, 59, 55],
[ 2024, 6, 15, 12, 27, 30],
[ 2024, 9, 26, 23, 58, 26],
[ 2024, 6, 6, 0, 19, 59],
[ 2024, 1, 8, 2, 7, 43],
[ 2024, 2, 16, 16, 20, 13],
[ 2024, 12, 22, 23, 54, 4]])
df.columns = ['year', 'month', 'day', 'hour', 'minute', 'second']
ts = pd.to_datetime(df, utc=True)
ts32 = pd.to_datetime(df.astype('float32'), utc=True)
ts64 = pd.to_datetime(df.astype('float64'), utc=True)
print (ts - ts32)
assert ts.equals(ts64)
assert ts.equals(ts32)
Issue Description
When constructing datetime from 6-column format, and the data is stored at 32-bit floats pandas.to_datetime silently produces strange (off by one day) results.
pandas.to_datetime should either produce correct results or throw an Exception. Correct results would be preferred :)
Expected Behavior
from datetime import datetime, UTC
import pandas as pd
start = datetime(2024, 1, 1)
end = datetime(2025, 1, 1)
samples = 10
df = pd.DataFrame([
[ 2024, 1, 7, 11, 42, 13],
[ 2024, 9, 19, 11, 54, 20],
[ 2024, 9, 17, 1, 22, 0],
[ 2024, 1, 24, 21, 59, 55],
[ 2024, 6, 15, 12, 27, 30],
[ 2024, 9, 26, 23, 58, 26],
[ 2024, 6, 6, 0, 19, 59],
[ 2024, 1, 8, 2, 7, 43],
[ 2024, 2, 16, 16, 20, 13],
[ 2024, 12, 22, 23, 54, 4]])
df.columns = ['year', 'month', 'day', 'hour', 'minute', 'second']
ts = pd.to_datetime(df, utc=True)
ts32 = pd.to_datetime(df.astype('float32'), utc=True)
ts64 = pd.to_datetime(df.astype('float64'), utc=True)
print (ts - ts32)
assert ts.equals(ts64)
assert ts.equals(ts32)
Installed Versions
pandas : 2.2.3
numpy : 2.1.2
pytz : 2024.2
dateutil : 2.9.0.post0
pip : 24.3.1
Cython : None
sphinx : None
IPython : 8.28.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : None
lxml.etree : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
psycopg2 : None
pymysql : None
pyarrow : 17.0.0
pyreadstat : None
pytest : 8.3.3
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
tzdata : 2024.2
qtpy : None
pyqt5 : None