Skip to content

BUG: read_sql reading duplicate tz aware columns #53311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 31, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -401,6 +401,7 @@ I/O
- Bug in :func:`read_hdf` not properly closing store after a ``IndexError`` is raised (:issue:`52781`)
- Bug in :func:`read_html`, style elements were read into DataFrames (:issue:`52197`)
- Bug in :func:`read_html`, tail texts were removed together with elements containing ``display:none`` style (:issue:`51629`)
- Bug in :func:`read_sql` when reading multiple timezone aware columns with the same column name (:issue:`44421`)
- Bug when writing and reading empty Stata dta files where dtype information was lost (:issue:`46240`)

Period
Expand Down
4 changes: 2 additions & 2 deletions pandas/io/sql.py
Original file line number Diff line number Diff line change
Expand Up @@ -131,13 +131,13 @@ def _parse_date_columns(data_frame, parse_dates):
# we want to coerce datetime64_tz dtypes for now to UTC
# we could in theory do a 'nice' conversion from a FixedOffset tz
# GH11216
for col_name, df_col in data_frame.items():
for i, (col_name, df_col) in enumerate(data_frame.items()):
if isinstance(df_col.dtype, DatetimeTZDtype) or col_name in parse_dates:
try:
fmt = parse_dates[col_name]
except TypeError:
fmt = None
data_frame[col_name] = _handle_date_column(df_col, format=fmt)
data_frame.isetitem(i, _handle_date_column(df_col, format=fmt))

return data_frame

Expand Down
38 changes: 38 additions & 0 deletions pandas/tests/io/test_sql.py
Original file line number Diff line number Diff line change
Expand Up @@ -2890,6 +2890,44 @@ def test_schema_support(self):
res2 = pdsql.read_table("test_schema_other2")
tm.assert_frame_equal(res1, res2)

def test_self_join_date_columns(self):
# GH 44421
from sqlalchemy.engine import Engine
from sqlalchemy.sql import text

create_table = text(
"""
CREATE TABLE person
(
id serial constraint person_pkey primary key,
created_dt timestamp with time zone
);

INSERT INTO person
VALUES (1, '2021-01-01T00:00:00Z');
"""
)
if isinstance(self.conn, Engine):
with self.conn.connect() as con:
with con.begin():
con.execute(create_table)
else:
with self.conn.begin():
self.conn.execute(create_table)

sql_query = (
'SELECT * FROM "person" AS p1 INNER JOIN "person" AS p2 ON p1.id = p2.id;'
)
result = pd.read_sql(sql_query, self.conn)
expected = DataFrame(
[[1, Timestamp("2021", tz="UTC")] * 2], columns=["id", "created_dt"] * 2
)
tm.assert_frame_equal(result, expected)

# Cleanup
with sql.SQLDatabase(self.conn, need_transaction=True) as pandasSQL:
pandasSQL.drop_table("person")


# -----------------------------------------------------------------------------
# -- Test Sqlite / MySQL fallback
Expand Down