python - Pivoting a pandas dataframe with duplicate index values -

March 15, 2011

i have data frame has rows each user joining site , making purchase.

+---+-----+--------------------+---------+--------+-----+ |   | uid |        msg         |  _time  | gender | age | +---+-----+--------------------+---------+--------+-----+ | 0 |   1 | confirmed_settings | 1/29/15 | m      |  37 | | 1 |   1 | sale               | 4/13/15 | m      |  37 | | 2 |   3 | confirmed_settings | 4/19/15 | m      |  35 | | 3 |   4 | confirmed_settings | 2/21/15 | m      |  21 | | 4 |   5 | confirmed_settings | 3/28/15 | m      |  18 | | 5 |   4 | sale               | 3/15/15 | m      |  21 | +---+-----+--------------------+---------+--------+-----+

i change dataframe each row unique uid , there columns called sale , confirmed_settings have timestamp of action. note not every user has sale, every user has confirmed_settings. below:

+---+-----+--------------------+---------+---------+--------+-----+ |   | uid | confirmed_settings |  sale   |  _time  | gender | age | +---+-----+--------------------+---------+---------+--------+-----+ | 0 |   1 | 1/29/15            | 4/13/15 | 1/29/15 | m      |  37 | | 1 |   3 | 4/19/15            | null    | 4/19/15 | m      |  35 | | 2 |   4 | 2/21/15            | 3/15/15 | 2/21/15 | m      |  21 | | 3 |   5 | 3/28/15            | null    | 3/28/15 | m      |  18 | +---+-----+--------------------+---------+---------+--------+-----+

to this, trying:

df1 = df.pivot(index='uid', columns='msg', values='_time').reset_index() df1 = df1.merge(df[['uid', 'gender', 'age']].drop_duplicates(), on='uid')

but error: valueerror: index contains duplicate entries, cannot reshape

how can pivot df duplicate index values transform dataframe?

edit: df1 = df.pivot_table(index='uid', columns='msg', values='_time').reset_index()

gives error dataerror: no numeric types aggregate im not sure right path go on.

x data frame have input :

    uid               msg   _time   gender  age 0   1   confirmed_settings  1/29/15 m       37 1   1   sale                4/13/15 m       37 2   3   confirmed_settings  4/19/15 m       35 3   4   confirmed_settings  2/21/15 m       21 4   5   confirmed_settings  3/28/15 m       18 5   4   sale                3/15/15 m       21  y = x.pivot(index='uid', columns='msg', values='_time') x.join(y).drop('msg', axis=1)

gives you:

    uid _time   gender  age     confirmed_settings  sale 0   1   1/29/15     m   37                    nan   nan 1   1   4/13/15     m   37                1/29/15   4/13/15 2   3   4/19/15     m   35                    nan   nan 3   4   2/21/15     m   21                4/19/15   nan 4   5   3/28/15     m   18                2/21/15   3/15/15 5   4   3/15/15     m   21                3/28/15   nan

Search This Blog

Ruby Code

python - Pivoting a pandas dataframe with duplicate index values -

Comments

Post a Comment

Popular posts from this blog

java - Spring Data JPA: Why findOne(id) executing delete query internally? -

python - Mongodb How to add addtional information when aggregating? -

java - Incorrect order of records in M-M relationship in hibernate -