Add suffixes to duplicate (column) cell values in Pandas depending on another column value (category)

general discussion forum gives members an opportunity to discuss proposed policy ideas prior to submitting as a formal
Post Reply
admin
Site Admin
Posts: 36
Joined: Sun Aug 08, 2021 7:49 am

Add suffixes to duplicate (column) cell values in Pandas depending on another column value (category)

Post by admin »

I have a structure like this:

Data_group Data Value
Group_x A 12
Group_x A 13
Group_x B 3
Group_x C 3
Group_x C 32
Group_x C 23
Group_y A 8
Group_y A 7
Group_y B 13
Group_y C 12
Group_y C 13
Group_y C 66
I would like to manipulate the Data column in order to have this as the output:

Data_group Data Value
Group_x A[0] 12
Group_x A[1] 13
Group_x B 3
Group_x C[0] 3
Group_x C[1] 32
Group_x C[2] 23
Group_y A[0] 8
Group_y A[1] 7
Group_y B 13
Group_y C[0] 12
Group_y C[1] 13
Group_y C[2] 66
Note that for every Group the index is reset and when there is just one entry of that Data there is no suffix. The Value column is not involved in the restructure. I have tried to exploit this:

mask = df['Data'].duplicated(keep=False)
df['Data'] += mask.cumcount().add(1).astype(str).radd('_').mask(df['Data'].transform('count')==1,'')

It will number the Data duplicate but the suffix will increment regardless of the Data Group. I started to do it with nested if and for but it appears cumbersome and not functional. Is there a clean way to use Pandas method have that result?

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.
Post Reply