Skip to content

BUG: fixes Arrow Dataframes/Series producing a Numpy object result #54025

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jul 18, 2023

Conversation

mcgeestocks
Copy link
Contributor

@mcgeestocks mcgeestocks commented Jul 6, 2023

Notes:

.dot() 's returned object dtype is now the result of dtypes.cast.find_common_type().

It's worth noting the new object's dtype's is of a higher precision but this seems line up what's expected: https://arrow.apache.org/docs/python/pandas.html#pandas-arrow-conversion

@mcgeestocks mcgeestocks marked this pull request as ready for review July 7, 2023 14:20
@mcgeestocks mcgeestocks marked this pull request as draft July 7, 2023 15:17
@mcgeestocks mcgeestocks marked this pull request as ready for review July 12, 2023 20:45
@mcgeestocks
Copy link
Contributor Author

Hey @phofl do you know who I should tag for a review on this?

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test for masked types as well? I suspect they failed before as well

@mroeschke mroeschke added Numeric Operations Arithmetic, Comparison, and Logical operations Arrow pyarrow functionality labels Jul 15, 2023
@mcgeestocks
Copy link
Contributor Author

Could you add a test for masked types as well? I suspect they failed before as well

Sure! I made an assumption about what you meant by the masked types Int8, Int16, Int32....

After I updated the tests I noticed another potential edge case that when calling dot with two different masked dtypes, for example Float64 and float[pyarrow]. I get return type Object

How would we want to handle this case?

@mroeschke
Copy link
Member

After I updated the tests I noticed another potential edge case that when calling dot with two different masked dtypes, for example Float64 and float[pyarrow]. I get return type Object

Let's ignore this case for now

@mroeschke mroeschke added this to the 2.1 milestone Jul 18, 2023
@mroeschke mroeschke merged commit f77a0e6 into pandas-dev:main Jul 18, 2023
@mroeschke
Copy link
Member

Thanks @mcgeestocks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: dot on Arrow Dataframes/Series produces a Numpy object result
2 participants