python - Modifying values of small slice of a column -
i'm trying center (subtract average) of slice of column. in following example search supercase (the var groups observations take average, assign, in same position, old value minus average). working bigger dataframe (477 rows × 85 columns) did test df show point
import random rd # 10 row 3 columns dataframe random floats test = pd.dataframe([[rd.random() n in range(3)] n in range(10)], columns = ["var{}".format(n+1) n in range(3)]) # supercase column group observations (rows) test["supercase"]=[1000]*2+[2000]*4+[3000]*3+[4000] # random metadata fluff n,_lett in zip(range(3),list("abc")): test["metadata{}".format(n+1)]=[_lett*int(rd.random()*10) in range(len(test.index))] # vars want work on _vars = test.columns[:3] # list of supercases work on supercases = test.supercase.unique() # go through calculations var in _vars: sc in supercases: test[var][test.supercase == sc]=test[var][test.supercase == sc]-test[var][test.supercase == sc].mean()
(i realize group 1 observation have centered value of zero)
nevertheless, , after waiting quite bit (with original df), following warning:
c:\python27\lib\site-packages\ipython\kernel\__main__.py:5: settingwithcopywarning: value trying set on copy of slice dataframe see the caveats in documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
i wasn't sure meant, tried creating copy of df , attributions on new df: test_ctr = pd.dataframe(test) #to avoid 2 vars pointing same object.
for var in _vars: sc in supercases: test_ctr[var][test_ctr.supercase == sc]=test[var][test.supercase == sc]-test[var][test.supercase == sc].mean()
this made me notice both test_ctr (as expected) , test being modified made me more confused.
how should done then? link above describes following proper way make have save index values:
dfc.loc[0,'a'] = 11
is there i'm missing? specially in case of test df being modified?
cheers , thanks!
i'm not sure can give great explanation warning beyond what's in documentation, appears did works fine , warning doesn't apply when appears.
nevertheless, there's faster , easier way want close groupby()
example in documentation here.
test[['var1','var2','var3','supercase']] var1 var2 var3 supercase 0 0.107989 0.275314 0.688784 1000 1 0.743372 0.726421 0.457137 1000 2 0.946661 0.469229 0.145584 2000 3 0.562564 0.040528 0.150148 2000 4 0.213042 0.934673 0.713870 2000 5 0.851200 0.371629 0.239308 2000 6 0.555617 0.502027 0.862414 3000 7 0.386040 0.954245 0.392592 3000 8 0.431534 0.088997 0.016639 3000 9 0.207693 0.269625 0.189688 4000 test.groupby('supercase')[_vars].transform( lambda x: x - x.mean() ) var1 var2 var3 0 -0.317692 -0.225554 0.115823 1 0.317692 0.225554 -0.115823 2 0.303294 0.015214 -0.166643 3 -0.080803 -0.413487 -0.162079 4 -0.430325 0.480658 0.401643 5 0.207833 -0.082386 -0.072920 6 0.097887 -0.013063 0.438533 7 -0.071691 0.439156 -0.031290 8 -0.026196 -0.426092 -0.407242 9 0.000000 0.000000 0.000000
in terms of getting copy of dataframe, standard way:
test_ctr = test.copy()
i have guessed tried test_ctr = pd.dataframe(test)
have worked apparently not!
Comments
Post a Comment