Python Pandas: How to groupby and compare columns -
here datafarme 'df':
match name group adamant adamant home network 86 adamant adamant, ltd. 86 adamant bild tov adamant-bild 86 360works 360works 94 360works 360works.com 94
per group number want compare names 1 one , see if matched same word 'match' column.
so desired output counts:
if match count 'tp' , if not count 'fn'.
i had idea of counting number of match words per group number not want:
df.groupby(group).count()
does body have idea how it?
if understood unclear question, should work:
import re import pandas df = pandas.dataframe([['adamant', 'adamant home network', 86], ['adamant', 'adamant, ltd.', 86], ['adamant bild', "tov adamant-bild", 86], ['360works', '360works', 94], ['360works ', "360works.com ", 94]], columns=['match', 'name', 'group']) def my_function(group): i, row in group.iterrows(): if ''.join(re.findall("[a-za-z]+", row['match'])).lower() not in ''.join( re.findall("[a-za-z]+", row['name'])).lower(): # parsing names in each columns , looking inclusion # if 1 of inclusion fails, return 'fn' return 'fn' # if inclusions succeed, return 'tp' return 'tp' res_series = df.groupby('group').apply(my_function) res_series.name = 'count' res_df = res_series.reset_index() print res_df
this give dataframe:
group count 1 86 'tp' 2 94 'tp'
Comments
Post a Comment