-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
BugDtype ConversionsUnexpected or buggy dtype conversionsUnexpected or buggy dtype conversionsGroupbyNumeric OperationsArithmetic, Comparison, and Logical operationsArithmetic, Comparison, and Logical operations
Milestone
Description
Dear pandas team
My environment is:
numpy==1.9.3
pandas==0.16.2
On it, may it be possible that the bug #10172 has been re-introduced? For instance, with a csv file (mini.csv
) with three int columns generated as per the code below:
import random
def contents(f, sc1=10, sc2=1000, cnt=10000):
for i in range(cnt):
zone = random.choice(range(10))
val1 = random.randint(0, zone * sc1)
val2 = random.randint(0, zone * sc2)
f.write("%d,%d,%d\n" % (zone, val1, val2))
with open("mini.csv", "w") as f:
f.write("zipcode,sqft,price\n")
contents(f, sc2=1000000, cnt=50000)
Once we load the file into pandas
import pandas
mini = pandas.read_csv("mini.csv")
mini.groupby('zipcode')[['price']].mean()
results in an the price mean being an int64
price
zipcode
0 0
1 499960
2 1005490
3 1465088
4 2001135
5 2495200
6 2993253
7 3569320
8 4076548
9 4416133
but if we add an extra column to the selection,
mini.groupby('zipcode')[['price','sqft']].mean()
The price mean is a float64
price sqft
zipcode
0 0.000000 0.000000
1 499960.563138 5.000400
2 1005490.239062 9.928459
3 1465088.765919 14.922507
4 2001135.222045 20.148962
5 2495200.657097 25.160088
6 2993253.872691 29.624297
7 3569320.017926 35.089428
8 4076548.696706 39.605981
9 4416133.246409 44.851756
Thanks in advance
Cristobal
Metadata
Metadata
Assignees
Labels
BugDtype ConversionsUnexpected or buggy dtype conversionsUnexpected or buggy dtype conversionsGroupbyNumeric OperationsArithmetic, Comparison, and Logical operationsArithmetic, Comparison, and Logical operations