Counting most common combination of values in dataframe column“Least Astonishment” and the Mutable...

What is the meaning of "heutig" in this sentence?

Would Taiwan and China's dispute be solved if Taiwan gave up being the Republic of China?

What are the benefits and disadvantages if a creature has multiple tails, e.g., Kyuubi or Nekomata?

Can the U.S. president make military decisions without consulting anyone?

Where does an unaligned creature's soul go after death?

Is it a good idea to leave minor world details to the reader's imagination?

Was there a trial by combat between a man and a dog in medieval France?

How do I deal with too many NPCs in my campaign?

How use custom order in folder on Windows 7 and 10

What happens if nobody can form a government in Israel?

Resolving moral conflict

Why is there not a feasible solution for a MIP?

Who created the Lightning Web Component?

reverse a list of generic type

Which museums have artworks of all four Ninja Turtles' namesakes?

Is it possible to encode a message in such a way that can only be read by someone or something capable of seeing into the very near future?

Where Does VDD+0.3V Input Limit Come From on IC chips?

Could Apollo astronauts see city lights from the moon?

Why does NASA publish all the results/data it gets?

Allocating credit card points

How much damage can be done just by heating matter?

What can a pilot do if an air traffic controller is incapacitated?

What's the story to "WotC gave up on fixing Polymorph"?

Algorithm that spans orthogonal vectors: Python

Counting most common combination of values in dataframe column

“Least Astonishment” and the Mutable Default ArgumentAdding new column to existing DataFrame in Python pandas“Large data” work flows using pandasChange data type of columns in PandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasCombine two columns of text in dataframe in pandas/pythonGet list from pandas DataFrame column headersHow to count the NaN values in a column in pandas DataFrameWhy is “1000000000000000 in range(1000000000000001)” so fast in Python 3?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}

I have DataFrame in the following form:

ID Product

1   A

1   B

2   A 

3   A

3   C 

3   D 

4   A

4   B

I would like to count the most common combination of two values from Product column grouped by ID.
So for this example expected result would be:

Combination Count

A-B          2

A-C          1

A-D          1

C-D          1

Is this output possible with pandas?

asked 9 hours ago

Alex T

7101 gold badge11 silver badges31 bronze badges

add a comment
|

I have DataFrame in the following form:

ID Product

1   A

1   B

2   A 

3   A

3   C 

3   D 

4   A

4   B

I would like to count the most common combination of two values from Product column grouped by ID.
So for this example expected result would be:

Combination Count

A-B          2

A-C          1

A-D          1

C-D          1

Is this output possible with pandas?

asked 9 hours ago

Alex T

7101 gold badge11 silver badges31 bronze badges

add a comment
|

I have DataFrame in the following form:

ID Product

1   A

1   B

2   A 

3   A

3   C 

3   D 

4   A

4   B

I would like to count the most common combination of two values from Product column grouped by ID.
So for this example expected result would be:

Combination Count

A-B          2

A-C          1

A-D          1

C-D          1

Is this output possible with pandas?

asked 9 hours ago

Alex T

7101 gold badge11 silver badges31 bronze badges

I have DataFrame in the following form:

ID Product

1   A

1   B

2   A 

3   A

3   C 

3   D 

4   A

4   B

I would like to count the most common combination of two values from Product column grouped by ID.
So for this example expected result would be:

Combination Count

A-B          2

A-C          1

A-D          1

C-D          1

Is this output possible with pandas?

python pandas

asked 9 hours ago

Alex T

7101 gold badge11 silver badges31 bronze badges

asked 9 hours ago

Alex T

7101 gold badge11 silver badges31 bronze badges

asked 9 hours ago

Alex T

7101 gold badge11 silver badges31 bronze badges

asked 9 hours ago

Alex T

7101 gold badge11 silver badges31 bronze badges

asked 9 hours ago

Alex T

7101 gold badge11 silver badges31 bronze badges

add a comment
|

5 Answers
5

active

oldest

votes

We can merge within ID and filter out duplicate merges (I assume you have a default RangeIndex). Then we sort so that the grouping is regardless of order:

import pandas as pd

import numpy as np



df1 = df.reset_index()

df1 = df1.merge(df1, on='ID').query('index_x > index_y')



df1 = pd.DataFrame(np.sort(df1[['Product_x', 'Product_y']].to_numpy(), axis=1))

df1.groupby([*df1]).size()

0  1

A  B    2

   C    1

   D    1

C  D    1

dtype: int64

edited 8 hours ago

answered 8 hours ago

ALollz

23.2k5 gold badges21 silver badges42 bronze badges

add a comment
|

You can use combinations from itertools along with groupby and apply

from itertools import combinations



def get_combs(x):

    return pd.DataFrame({'Combination': list(combinations(x.Product.values, 2))})

(df.groupby('ID').apply(get_combs)

 .reset_index(level=0)

 .groupby('Combination')

 .count()

)

             ID

Combination    

(A, B)        2

(A, C)        1

(A, D)        1

(C, D)        1

answered 8 hours ago

SIA

5484 silver badges10 bronze badges

add a comment
|

Using itertools and Counter.

import itertools

from collections import Counter



agg_ = lambda x: tuple(itertools.combinations(x, 2))

product = list(itertools.chain(*df.groupby('ID').agg({'Product': lambda x: agg_(sorted(x))}).Product))

# You actually do not need to wrap product with list. The generator is ok

counts = Counter(product)

Output

Counter({('A', 'B'): 2, ('A', 'C'): 1, ('A', 'D'): 1, ('C', 'D'): 1})

You could also do the following to get a dataframe

pd.DataFrame(list(counts.items()), columns=['combination', 'count'])



  combination  count

0      (A, B)      2

1      (A, C)      1

2      (A, D)      1

3      (C, D)      1

edited 8 hours ago

answered 8 hours ago

Buckeye14Guy

4334 silver badges8 bronze badges

add a comment
|

Use itertools.combinations, explode and value_counts

import itertools



(df.groupby('ID').Product.agg(lambda x: list(itertools.combinations(x,2)))

                 .explode().str.join('-').value_counts())



Out[611]:

A-B    2

C-D    1

A-D    1

A-C    1

Name: Product, dtype: int64

Or:

import itertools



(df.groupby('ID').Product.agg(lambda x: list(map('-'.join, itertools.combinations(x,2))))

                 .explode().value_counts())



Out[597]:

A-B    2

C-D    1

A-D    1

A-C    1

Name: Product, dtype: int64

edited 8 hours ago

answered 8 hours ago

Andy L.

5,8621 gold badge3 silver badges16 bronze badges

add a comment
|

Another trick with itertools.combinations function:

from itertools import combinations

import pandas as pd



test_df = ... # your df

counts_df = test_df.groupby('ID')['Product'].agg(lambda x: list(combinations(x, 2)))

    .apply(pd.Series).stack().value_counts().to_frame()

    .reset_index().rename(columns={'index': 'Combination', 0:'Count'})

print(counts_df)

The output:

  Combination  Count

0      (A, B)      2

1      (A, C)      1

2      (A, D)      1

3      (C, D)      1

answered 8 hours ago

RomanPerekhrest

64.3k4 gold badges22 silver badges58 bronze badges

add a comment
|

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f58018049%2fcounting-most-common-combination-of-values-in-dataframe-column%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

We can merge within ID and filter out duplicate merges (I assume you have a default RangeIndex). Then we sort so that the grouping is regardless of order:

import pandas as pd

import numpy as np



df1 = df.reset_index()

df1 = df1.merge(df1, on='ID').query('index_x > index_y')



df1 = pd.DataFrame(np.sort(df1[['Product_x', 'Product_y']].to_numpy(), axis=1))

df1.groupby([*df1]).size()

0  1

A  B    2

   C    1

   D    1

C  D    1

dtype: int64

edited 8 hours ago

answered 8 hours ago

ALollz

23.2k5 gold badges21 silver badges42 bronze badges

add a comment
|

We can merge within ID and filter out duplicate merges (I assume you have a default RangeIndex). Then we sort so that the grouping is regardless of order:

import pandas as pd

import numpy as np



df1 = df.reset_index()

df1 = df1.merge(df1, on='ID').query('index_x > index_y')



df1 = pd.DataFrame(np.sort(df1[['Product_x', 'Product_y']].to_numpy(), axis=1))

df1.groupby([*df1]).size()

0  1

A  B    2

   C    1

   D    1

C  D    1

dtype: int64

edited 8 hours ago

answered 8 hours ago

ALollz

23.2k5 gold badges21 silver badges42 bronze badges

add a comment
|

We can merge within ID and filter out duplicate merges (I assume you have a default RangeIndex). Then we sort so that the grouping is regardless of order:

import pandas as pd

import numpy as np



df1 = df.reset_index()

df1 = df1.merge(df1, on='ID').query('index_x > index_y')



df1 = pd.DataFrame(np.sort(df1[['Product_x', 'Product_y']].to_numpy(), axis=1))

df1.groupby([*df1]).size()

0  1

A  B    2

   C    1

   D    1

C  D    1

dtype: int64

edited 8 hours ago

answered 8 hours ago

ALollz

23.2k5 gold badges21 silver badges42 bronze badges

We can merge within ID and filter out duplicate merges (I assume you have a default RangeIndex). Then we sort so that the grouping is regardless of order:

import pandas as pd

import numpy as np



df1 = df.reset_index()

df1 = df1.merge(df1, on='ID').query('index_x > index_y')



df1 = pd.DataFrame(np.sort(df1[['Product_x', 'Product_y']].to_numpy(), axis=1))

df1.groupby([*df1]).size()

0  1

A  B    2

   C    1

   D    1

C  D    1

dtype: int64

edited 8 hours ago

answered 8 hours ago

ALollz

23.2k5 gold badges21 silver badges42 bronze badges

edited 8 hours ago

answered 8 hours ago

ALollz

23.2k5 gold badges21 silver badges42 bronze badges

answered 8 hours ago

ALollz

23.2k5 gold badges21 silver badges42 bronze badges

answered 8 hours ago

ALollz

23.2k5 gold badges21 silver badges42 bronze badges

add a comment
|

You can use combinations from itertools along with groupby and apply

from itertools import combinations



def get_combs(x):

    return pd.DataFrame({'Combination': list(combinations(x.Product.values, 2))})

(df.groupby('ID').apply(get_combs)

 .reset_index(level=0)

 .groupby('Combination')

 .count()

)

             ID

Combination    

(A, B)        2

(A, C)        1

(A, D)        1

(C, D)        1

answered 8 hours ago

SIA

5484 silver badges10 bronze badges

add a comment
|

You can use combinations from itertools along with groupby and apply

from itertools import combinations



def get_combs(x):

    return pd.DataFrame({'Combination': list(combinations(x.Product.values, 2))})

(df.groupby('ID').apply(get_combs)

 .reset_index(level=0)

 .groupby('Combination')

 .count()

)

             ID

Combination    

(A, B)        2

(A, C)        1

(A, D)        1

(C, D)        1

answered 8 hours ago

SIA

5484 silver badges10 bronze badges

add a comment
|

You can use combinations from itertools along with groupby and apply

from itertools import combinations



def get_combs(x):

    return pd.DataFrame({'Combination': list(combinations(x.Product.values, 2))})

(df.groupby('ID').apply(get_combs)

 .reset_index(level=0)

 .groupby('Combination')

 .count()

)

             ID

Combination    

(A, B)        2

(A, C)        1

(A, D)        1

(C, D)        1

answered 8 hours ago

SIA

5484 silver badges10 bronze badges

You can use combinations from itertools along with groupby and apply

from itertools import combinations



def get_combs(x):

    return pd.DataFrame({'Combination': list(combinations(x.Product.values, 2))})

(df.groupby('ID').apply(get_combs)

 .reset_index(level=0)

 .groupby('Combination')

 .count()

)

             ID

Combination    

(A, B)        2

(A, C)        1

(A, D)        1

(C, D)        1

answered 8 hours ago

SIA

5484 silver badges10 bronze badges

answered 8 hours ago

SIA

5484 silver badges10 bronze badges

answered 8 hours ago

SIA

5484 silver badges10 bronze badges

answered 8 hours ago

SIA

5484 silver badges10 bronze badges

add a comment
|

Using itertools and Counter.

import itertools

from collections import Counter



agg_ = lambda x: tuple(itertools.combinations(x, 2))

product = list(itertools.chain(*df.groupby('ID').agg({'Product': lambda x: agg_(sorted(x))}).Product))

# You actually do not need to wrap product with list. The generator is ok

counts = Counter(product)

Output

Counter({('A', 'B'): 2, ('A', 'C'): 1, ('A', 'D'): 1, ('C', 'D'): 1})

You could also do the following to get a dataframe

pd.DataFrame(list(counts.items()), columns=['combination', 'count'])



  combination  count

0      (A, B)      2

1      (A, C)      1

2      (A, D)      1

3      (C, D)      1

edited 8 hours ago

answered 8 hours ago

Buckeye14Guy

4334 silver badges8 bronze badges

add a comment
|

Using itertools and Counter.

import itertools

from collections import Counter



agg_ = lambda x: tuple(itertools.combinations(x, 2))

product = list(itertools.chain(*df.groupby('ID').agg({'Product': lambda x: agg_(sorted(x))}).Product))

# You actually do not need to wrap product with list. The generator is ok

counts = Counter(product)

Output

Counter({('A', 'B'): 2, ('A', 'C'): 1, ('A', 'D'): 1, ('C', 'D'): 1})

You could also do the following to get a dataframe

pd.DataFrame(list(counts.items()), columns=['combination', 'count'])



  combination  count

0      (A, B)      2

1      (A, C)      1

2      (A, D)      1

3      (C, D)      1

edited 8 hours ago

answered 8 hours ago

Buckeye14Guy

4334 silver badges8 bronze badges

add a comment
|

Using itertools and Counter.

import itertools

from collections import Counter



agg_ = lambda x: tuple(itertools.combinations(x, 2))

product = list(itertools.chain(*df.groupby('ID').agg({'Product': lambda x: agg_(sorted(x))}).Product))

# You actually do not need to wrap product with list. The generator is ok

counts = Counter(product)

Output

Counter({('A', 'B'): 2, ('A', 'C'): 1, ('A', 'D'): 1, ('C', 'D'): 1})

You could also do the following to get a dataframe

pd.DataFrame(list(counts.items()), columns=['combination', 'count'])



  combination  count

0      (A, B)      2

1      (A, C)      1

2      (A, D)      1

3      (C, D)      1

edited 8 hours ago

answered 8 hours ago

Buckeye14Guy

4334 silver badges8 bronze badges

Using itertools and Counter.

import itertools

from collections import Counter



agg_ = lambda x: tuple(itertools.combinations(x, 2))

product = list(itertools.chain(*df.groupby('ID').agg({'Product': lambda x: agg_(sorted(x))}).Product))

# You actually do not need to wrap product with list. The generator is ok

counts = Counter(product)

Output

Counter({('A', 'B'): 2, ('A', 'C'): 1, ('A', 'D'): 1, ('C', 'D'): 1})

You could also do the following to get a dataframe

pd.DataFrame(list(counts.items()), columns=['combination', 'count'])



  combination  count

0      (A, B)      2

1      (A, C)      1

2      (A, D)      1

3      (C, D)      1

edited 8 hours ago

answered 8 hours ago

Buckeye14Guy

4334 silver badges8 bronze badges

edited 8 hours ago

answered 8 hours ago

Buckeye14Guy

4334 silver badges8 bronze badges

answered 8 hours ago

Buckeye14Guy

4334 silver badges8 bronze badges

answered 8 hours ago

Buckeye14Guy

4334 silver badges8 bronze badges

add a comment
|

Use itertools.combinations, explode and value_counts

import itertools



(df.groupby('ID').Product.agg(lambda x: list(itertools.combinations(x,2)))

                 .explode().str.join('-').value_counts())



Out[611]:

A-B    2

C-D    1

A-D    1

A-C    1

Name: Product, dtype: int64

Or:

import itertools



(df.groupby('ID').Product.agg(lambda x: list(map('-'.join, itertools.combinations(x,2))))

                 .explode().value_counts())



Out[597]:

A-B    2

C-D    1

A-D    1

A-C    1

Name: Product, dtype: int64

edited 8 hours ago

answered 8 hours ago

Andy L.

5,8621 gold badge3 silver badges16 bronze badges

add a comment
|

Use itertools.combinations, explode and value_counts

import itertools



(df.groupby('ID').Product.agg(lambda x: list(itertools.combinations(x,2)))

                 .explode().str.join('-').value_counts())



Out[611]:

A-B    2

C-D    1

A-D    1

A-C    1

Name: Product, dtype: int64

Or:

import itertools



(df.groupby('ID').Product.agg(lambda x: list(map('-'.join, itertools.combinations(x,2))))

                 .explode().value_counts())



Out[597]:

A-B    2

C-D    1

A-D    1

A-C    1

Name: Product, dtype: int64

edited 8 hours ago

answered 8 hours ago

Andy L.

5,8621 gold badge3 silver badges16 bronze badges

add a comment
|

Use itertools.combinations, explode and value_counts

import itertools



(df.groupby('ID').Product.agg(lambda x: list(itertools.combinations(x,2)))

                 .explode().str.join('-').value_counts())



Out[611]:

A-B    2

C-D    1

A-D    1

A-C    1

Name: Product, dtype: int64

Or:

import itertools



(df.groupby('ID').Product.agg(lambda x: list(map('-'.join, itertools.combinations(x,2))))

                 .explode().value_counts())



Out[597]:

A-B    2

C-D    1

A-D    1

A-C    1

Name: Product, dtype: int64

edited 8 hours ago

answered 8 hours ago

Andy L.

5,8621 gold badge3 silver badges16 bronze badges

Use itertools.combinations, explode and value_counts

import itertools



(df.groupby('ID').Product.agg(lambda x: list(itertools.combinations(x,2)))

                 .explode().str.join('-').value_counts())



Out[611]:

A-B    2

C-D    1

A-D    1

A-C    1

Name: Product, dtype: int64

Or:

import itertools



(df.groupby('ID').Product.agg(lambda x: list(map('-'.join, itertools.combinations(x,2))))

                 .explode().value_counts())



Out[597]:

A-B    2

C-D    1

A-D    1

A-C    1

Name: Product, dtype: int64

edited 8 hours ago

answered 8 hours ago

Andy L.

5,8621 gold badge3 silver badges16 bronze badges

edited 8 hours ago

answered 8 hours ago

Andy L.

5,8621 gold badge3 silver badges16 bronze badges

answered 8 hours ago

Andy L.

5,8621 gold badge3 silver badges16 bronze badges

answered 8 hours ago

Andy L.

5,8621 gold badge3 silver badges16 bronze badges

add a comment
|

Another trick with itertools.combinations function:

from itertools import combinations

import pandas as pd



test_df = ... # your df

counts_df = test_df.groupby('ID')['Product'].agg(lambda x: list(combinations(x, 2)))

    .apply(pd.Series).stack().value_counts().to_frame()

    .reset_index().rename(columns={'index': 'Combination', 0:'Count'})

print(counts_df)

The output:

  Combination  Count

0      (A, B)      2

1      (A, C)      1

2      (A, D)      1

3      (C, D)      1

answered 8 hours ago

RomanPerekhrest

64.3k4 gold badges22 silver badges58 bronze badges

add a comment
|

Another trick with itertools.combinations function:

from itertools import combinations

import pandas as pd



test_df = ... # your df

counts_df = test_df.groupby('ID')['Product'].agg(lambda x: list(combinations(x, 2)))

    .apply(pd.Series).stack().value_counts().to_frame()

    .reset_index().rename(columns={'index': 'Combination', 0:'Count'})

print(counts_df)

The output:

  Combination  Count

0      (A, B)      2

1      (A, C)      1

2      (A, D)      1

3      (C, D)      1

answered 8 hours ago

RomanPerekhrest

64.3k4 gold badges22 silver badges58 bronze badges

add a comment
|

Another trick with itertools.combinations function:

from itertools import combinations

import pandas as pd



test_df = ... # your df

counts_df = test_df.groupby('ID')['Product'].agg(lambda x: list(combinations(x, 2)))

    .apply(pd.Series).stack().value_counts().to_frame()

    .reset_index().rename(columns={'index': 'Combination', 0:'Count'})

print(counts_df)

The output:

  Combination  Count

0      (A, B)      2

1      (A, C)      1

2      (A, D)      1

3      (C, D)      1

answered 8 hours ago

RomanPerekhrest

64.3k4 gold badges22 silver badges58 bronze badges

Another trick with itertools.combinations function:

from itertools import combinations

import pandas as pd



test_df = ... # your df

counts_df = test_df.groupby('ID')['Product'].agg(lambda x: list(combinations(x, 2)))

    .apply(pd.Series).stack().value_counts().to_frame()

    .reset_index().rename(columns={'index': 'Combination', 0:'Count'})

print(counts_df)

The output:

  Combination  Count

0      (A, B)      2

1      (A, C)      1

2      (A, D)      1

3      (C, D)      1

answered 8 hours ago

RomanPerekhrest

64.3k4 gold badges22 silver badges58 bronze badges

answered 8 hours ago

RomanPerekhrest

64.3k4 gold badges22 silver badges58 bronze badges

answered 8 hours ago

RomanPerekhrest

64.3k4 gold badges22 silver badges58 bronze badges

answered 8 hours ago

RomanPerekhrest

64.3k4 gold badges22 silver badges58 bronze badges

add a comment
|

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Mdthbs