Transposing one column in python pandas with the simplest index possible -
i have following data (data_current
):
import pandas pd import numpy np data_current=pd.dataframe({'medicine':['green tea','fried tomatoes','meditation','meditation'],'disease':['acne','hypertension', 'cancer','lupus']}) data_current
what transpose 1 of columns, instead of having multiple rows same medicine , different diseases have 1 row each medicine several columns diseases. important keep index simple possible, i.e. 0,1,2... i.e. don't want assign 'medicines' index column because merge on other key. so, need data_needed
data_needed=pd.dataframe({'medicine':['green tea','fried tomatoes','meditation'],'disease_1':['acne','hypertension','cancer'], 'disease_2':['np.nan','np.nan','lupus']}) data_needed
here's 1 achieve output
firstly, groupby
on medicine
, disease
list
in [368]: md = (data_current.groupby('medicine') .apply(lambda x: x['disease'].tolist()) .reset_index()) in [369]: md out[369]: medicine 0 0 fried tomatoes [hypertension] 1 green tea [acne] 2 meditation [cancer, lupus]
then convert lists in column separate columns
in [370]: dval = pd.dataframe(md[0].tolist(), ) in [371]: dval out[371]: 0 1 0 hypertension none 1 acne none 2 cancer lupus
now, can concat
-- md
dval
in [372]: md = md.drop(0, axis=1) in [373]: data_final = pd.concat([md, dval], axis=1)
and, rename columns want.
in [374]: data_final.columns = ['medicine', 'disease_1', 'disease_2'] in [375]: data_final out[375]: medicine disease_1 disease_2 0 fried tomatoes hypertension none 1 green tea acne none 2 meditation cancer lupus
Comments
Post a Comment