Counting most common combination of values in dataframe column“Least Astonishment” and the Mutable...
What is the meaning of "heutig" in this sentence?
Would Taiwan and China's dispute be solved if Taiwan gave up being the Republic of China?
What are the benefits and disadvantages if a creature has multiple tails, e.g., Kyuubi or Nekomata?
Can the U.S. president make military decisions without consulting anyone?
Where does an unaligned creature's soul go after death?
Is it a good idea to leave minor world details to the reader's imagination?
Was there a trial by combat between a man and a dog in medieval France?
How do I deal with too many NPCs in my campaign?
How use custom order in folder on Windows 7 and 10
What happens if nobody can form a government in Israel?
Resolving moral conflict
Why is there not a feasible solution for a MIP?
Who created the Lightning Web Component?
reverse a list of generic type
Which museums have artworks of all four Ninja Turtles' namesakes?
Is it possible to encode a message in such a way that can only be read by someone or something capable of seeing into the very near future?
Where Does VDD+0.3V Input Limit Come From on IC chips?
Could Apollo astronauts see city lights from the moon?
Why does NASA publish all the results/data it gets?
Allocating credit card points
How much damage can be done just by heating matter?
What can a pilot do if an air traffic controller is incapacitated?
What's the story to "WotC gave up on fixing Polymorph"?
Algorithm that spans orthogonal vectors: Python
Counting most common combination of values in dataframe column
“Least Astonishment” and the Mutable Default ArgumentAdding new column to existing DataFrame in Python pandas“Large data” work flows using pandasChange data type of columns in PandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasCombine two columns of text in dataframe in pandas/pythonGet list from pandas DataFrame column headersHow to count the NaN values in a column in pandas DataFrameWhy is “1000000000000000 in range(1000000000000001)” so fast in Python 3?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
I have DataFrame in the following form:
ID Product
1 A
1 B
2 A
3 A
3 C
3 D
4 A
4 B
I would like to count the most common combination of two values from Product
column grouped by ID
.
So for this example expected result would be:
Combination Count
A-B 2
A-C 1
A-D 1
C-D 1
Is this output possible with pandas?
python pandas
add a comment
|
I have DataFrame in the following form:
ID Product
1 A
1 B
2 A
3 A
3 C
3 D
4 A
4 B
I would like to count the most common combination of two values from Product
column grouped by ID
.
So for this example expected result would be:
Combination Count
A-B 2
A-C 1
A-D 1
C-D 1
Is this output possible with pandas?
python pandas
add a comment
|
I have DataFrame in the following form:
ID Product
1 A
1 B
2 A
3 A
3 C
3 D
4 A
4 B
I would like to count the most common combination of two values from Product
column grouped by ID
.
So for this example expected result would be:
Combination Count
A-B 2
A-C 1
A-D 1
C-D 1
Is this output possible with pandas?
python pandas
I have DataFrame in the following form:
ID Product
1 A
1 B
2 A
3 A
3 C
3 D
4 A
4 B
I would like to count the most common combination of two values from Product
column grouped by ID
.
So for this example expected result would be:
Combination Count
A-B 2
A-C 1
A-D 1
C-D 1
Is this output possible with pandas?
python pandas
python pandas
asked 9 hours ago
Alex TAlex T
7101 gold badge11 silver badges31 bronze badges
7101 gold badge11 silver badges31 bronze badges
add a comment
|
add a comment
|
5 Answers
5
active
oldest
votes
We can merge
within ID and filter out duplicate merges (I assume you have a default RangeIndex
). Then we sort so that the grouping is regardless of order:
import pandas as pd
import numpy as np
df1 = df.reset_index()
df1 = df1.merge(df1, on='ID').query('index_x > index_y')
df1 = pd.DataFrame(np.sort(df1[['Product_x', 'Product_y']].to_numpy(), axis=1))
df1.groupby([*df1]).size()
0 1
A B 2
C 1
D 1
C D 1
dtype: int64
add a comment
|
You can use combinations
from itertools
along with groupby
and apply
from itertools import combinations
def get_combs(x):
return pd.DataFrame({'Combination': list(combinations(x.Product.values, 2))})
(df.groupby('ID').apply(get_combs)
.reset_index(level=0)
.groupby('Combination')
.count()
)
ID
Combination
(A, B) 2
(A, C) 1
(A, D) 1
(C, D) 1
add a comment
|
Using itertools
and Counter
.
import itertools
from collections import Counter
agg_ = lambda x: tuple(itertools.combinations(x, 2))
product = list(itertools.chain(*df.groupby('ID').agg({'Product': lambda x: agg_(sorted(x))}).Product))
# You actually do not need to wrap product with list. The generator is ok
counts = Counter(product)
Output
Counter({('A', 'B'): 2, ('A', 'C'): 1, ('A', 'D'): 1, ('C', 'D'): 1})
You could also do the following to get a dataframe
pd.DataFrame(list(counts.items()), columns=['combination', 'count'])
combination count
0 (A, B) 2
1 (A, C) 1
2 (A, D) 1
3 (C, D) 1
add a comment
|
Use itertools.combinations
, explode
and value_counts
import itertools
(df.groupby('ID').Product.agg(lambda x: list(itertools.combinations(x,2)))
.explode().str.join('-').value_counts())
Out[611]:
A-B 2
C-D 1
A-D 1
A-C 1
Name: Product, dtype: int64
Or:
import itertools
(df.groupby('ID').Product.agg(lambda x: list(map('-'.join, itertools.combinations(x,2))))
.explode().value_counts())
Out[597]:
A-B 2
C-D 1
A-D 1
A-C 1
Name: Product, dtype: int64
add a comment
|
Another trick with itertools.combinations
function:
from itertools import combinations
import pandas as pd
test_df = ... # your df
counts_df = test_df.groupby('ID')['Product'].agg(lambda x: list(combinations(x, 2)))
.apply(pd.Series).stack().value_counts().to_frame()
.reset_index().rename(columns={'index': 'Combination', 0:'Count'})
print(counts_df)
The output:
Combination Count
0 (A, B) 2
1 (A, C) 1
2 (A, D) 1
3 (C, D) 1
add a comment
|
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f58018049%2fcounting-most-common-combination-of-values-in-dataframe-column%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
We can merge
within ID and filter out duplicate merges (I assume you have a default RangeIndex
). Then we sort so that the grouping is regardless of order:
import pandas as pd
import numpy as np
df1 = df.reset_index()
df1 = df1.merge(df1, on='ID').query('index_x > index_y')
df1 = pd.DataFrame(np.sort(df1[['Product_x', 'Product_y']].to_numpy(), axis=1))
df1.groupby([*df1]).size()
0 1
A B 2
C 1
D 1
C D 1
dtype: int64
add a comment
|
We can merge
within ID and filter out duplicate merges (I assume you have a default RangeIndex
). Then we sort so that the grouping is regardless of order:
import pandas as pd
import numpy as np
df1 = df.reset_index()
df1 = df1.merge(df1, on='ID').query('index_x > index_y')
df1 = pd.DataFrame(np.sort(df1[['Product_x', 'Product_y']].to_numpy(), axis=1))
df1.groupby([*df1]).size()
0 1
A B 2
C 1
D 1
C D 1
dtype: int64
add a comment
|
We can merge
within ID and filter out duplicate merges (I assume you have a default RangeIndex
). Then we sort so that the grouping is regardless of order:
import pandas as pd
import numpy as np
df1 = df.reset_index()
df1 = df1.merge(df1, on='ID').query('index_x > index_y')
df1 = pd.DataFrame(np.sort(df1[['Product_x', 'Product_y']].to_numpy(), axis=1))
df1.groupby([*df1]).size()
0 1
A B 2
C 1
D 1
C D 1
dtype: int64
We can merge
within ID and filter out duplicate merges (I assume you have a default RangeIndex
). Then we sort so that the grouping is regardless of order:
import pandas as pd
import numpy as np
df1 = df.reset_index()
df1 = df1.merge(df1, on='ID').query('index_x > index_y')
df1 = pd.DataFrame(np.sort(df1[['Product_x', 'Product_y']].to_numpy(), axis=1))
df1.groupby([*df1]).size()
0 1
A B 2
C 1
D 1
C D 1
dtype: int64
edited 8 hours ago
answered 8 hours ago
ALollzALollz
23.2k5 gold badges21 silver badges42 bronze badges
23.2k5 gold badges21 silver badges42 bronze badges
add a comment
|
add a comment
|
You can use combinations
from itertools
along with groupby
and apply
from itertools import combinations
def get_combs(x):
return pd.DataFrame({'Combination': list(combinations(x.Product.values, 2))})
(df.groupby('ID').apply(get_combs)
.reset_index(level=0)
.groupby('Combination')
.count()
)
ID
Combination
(A, B) 2
(A, C) 1
(A, D) 1
(C, D) 1
add a comment
|
You can use combinations
from itertools
along with groupby
and apply
from itertools import combinations
def get_combs(x):
return pd.DataFrame({'Combination': list(combinations(x.Product.values, 2))})
(df.groupby('ID').apply(get_combs)
.reset_index(level=0)
.groupby('Combination')
.count()
)
ID
Combination
(A, B) 2
(A, C) 1
(A, D) 1
(C, D) 1
add a comment
|
You can use combinations
from itertools
along with groupby
and apply
from itertools import combinations
def get_combs(x):
return pd.DataFrame({'Combination': list(combinations(x.Product.values, 2))})
(df.groupby('ID').apply(get_combs)
.reset_index(level=0)
.groupby('Combination')
.count()
)
ID
Combination
(A, B) 2
(A, C) 1
(A, D) 1
(C, D) 1
You can use combinations
from itertools
along with groupby
and apply
from itertools import combinations
def get_combs(x):
return pd.DataFrame({'Combination': list(combinations(x.Product.values, 2))})
(df.groupby('ID').apply(get_combs)
.reset_index(level=0)
.groupby('Combination')
.count()
)
ID
Combination
(A, B) 2
(A, C) 1
(A, D) 1
(C, D) 1
answered 8 hours ago
SIASIA
5484 silver badges10 bronze badges
5484 silver badges10 bronze badges
add a comment
|
add a comment
|
Using itertools
and Counter
.
import itertools
from collections import Counter
agg_ = lambda x: tuple(itertools.combinations(x, 2))
product = list(itertools.chain(*df.groupby('ID').agg({'Product': lambda x: agg_(sorted(x))}).Product))
# You actually do not need to wrap product with list. The generator is ok
counts = Counter(product)
Output
Counter({('A', 'B'): 2, ('A', 'C'): 1, ('A', 'D'): 1, ('C', 'D'): 1})
You could also do the following to get a dataframe
pd.DataFrame(list(counts.items()), columns=['combination', 'count'])
combination count
0 (A, B) 2
1 (A, C) 1
2 (A, D) 1
3 (C, D) 1
add a comment
|
Using itertools
and Counter
.
import itertools
from collections import Counter
agg_ = lambda x: tuple(itertools.combinations(x, 2))
product = list(itertools.chain(*df.groupby('ID').agg({'Product': lambda x: agg_(sorted(x))}).Product))
# You actually do not need to wrap product with list. The generator is ok
counts = Counter(product)
Output
Counter({('A', 'B'): 2, ('A', 'C'): 1, ('A', 'D'): 1, ('C', 'D'): 1})
You could also do the following to get a dataframe
pd.DataFrame(list(counts.items()), columns=['combination', 'count'])
combination count
0 (A, B) 2
1 (A, C) 1
2 (A, D) 1
3 (C, D) 1
add a comment
|
Using itertools
and Counter
.
import itertools
from collections import Counter
agg_ = lambda x: tuple(itertools.combinations(x, 2))
product = list(itertools.chain(*df.groupby('ID').agg({'Product': lambda x: agg_(sorted(x))}).Product))
# You actually do not need to wrap product with list. The generator is ok
counts = Counter(product)
Output
Counter({('A', 'B'): 2, ('A', 'C'): 1, ('A', 'D'): 1, ('C', 'D'): 1})
You could also do the following to get a dataframe
pd.DataFrame(list(counts.items()), columns=['combination', 'count'])
combination count
0 (A, B) 2
1 (A, C) 1
2 (A, D) 1
3 (C, D) 1
Using itertools
and Counter
.
import itertools
from collections import Counter
agg_ = lambda x: tuple(itertools.combinations(x, 2))
product = list(itertools.chain(*df.groupby('ID').agg({'Product': lambda x: agg_(sorted(x))}).Product))
# You actually do not need to wrap product with list. The generator is ok
counts = Counter(product)
Output
Counter({('A', 'B'): 2, ('A', 'C'): 1, ('A', 'D'): 1, ('C', 'D'): 1})
You could also do the following to get a dataframe
pd.DataFrame(list(counts.items()), columns=['combination', 'count'])
combination count
0 (A, B) 2
1 (A, C) 1
2 (A, D) 1
3 (C, D) 1
edited 8 hours ago
answered 8 hours ago
Buckeye14GuyBuckeye14Guy
4334 silver badges8 bronze badges
4334 silver badges8 bronze badges
add a comment
|
add a comment
|
Use itertools.combinations
, explode
and value_counts
import itertools
(df.groupby('ID').Product.agg(lambda x: list(itertools.combinations(x,2)))
.explode().str.join('-').value_counts())
Out[611]:
A-B 2
C-D 1
A-D 1
A-C 1
Name: Product, dtype: int64
Or:
import itertools
(df.groupby('ID').Product.agg(lambda x: list(map('-'.join, itertools.combinations(x,2))))
.explode().value_counts())
Out[597]:
A-B 2
C-D 1
A-D 1
A-C 1
Name: Product, dtype: int64
add a comment
|
Use itertools.combinations
, explode
and value_counts
import itertools
(df.groupby('ID').Product.agg(lambda x: list(itertools.combinations(x,2)))
.explode().str.join('-').value_counts())
Out[611]:
A-B 2
C-D 1
A-D 1
A-C 1
Name: Product, dtype: int64
Or:
import itertools
(df.groupby('ID').Product.agg(lambda x: list(map('-'.join, itertools.combinations(x,2))))
.explode().value_counts())
Out[597]:
A-B 2
C-D 1
A-D 1
A-C 1
Name: Product, dtype: int64
add a comment
|
Use itertools.combinations
, explode
and value_counts
import itertools
(df.groupby('ID').Product.agg(lambda x: list(itertools.combinations(x,2)))
.explode().str.join('-').value_counts())
Out[611]:
A-B 2
C-D 1
A-D 1
A-C 1
Name: Product, dtype: int64
Or:
import itertools
(df.groupby('ID').Product.agg(lambda x: list(map('-'.join, itertools.combinations(x,2))))
.explode().value_counts())
Out[597]:
A-B 2
C-D 1
A-D 1
A-C 1
Name: Product, dtype: int64
Use itertools.combinations
, explode
and value_counts
import itertools
(df.groupby('ID').Product.agg(lambda x: list(itertools.combinations(x,2)))
.explode().str.join('-').value_counts())
Out[611]:
A-B 2
C-D 1
A-D 1
A-C 1
Name: Product, dtype: int64
Or:
import itertools
(df.groupby('ID').Product.agg(lambda x: list(map('-'.join, itertools.combinations(x,2))))
.explode().value_counts())
Out[597]:
A-B 2
C-D 1
A-D 1
A-C 1
Name: Product, dtype: int64
edited 8 hours ago
answered 8 hours ago
Andy L.Andy L.
5,8621 gold badge3 silver badges16 bronze badges
5,8621 gold badge3 silver badges16 bronze badges
add a comment
|
add a comment
|
Another trick with itertools.combinations
function:
from itertools import combinations
import pandas as pd
test_df = ... # your df
counts_df = test_df.groupby('ID')['Product'].agg(lambda x: list(combinations(x, 2)))
.apply(pd.Series).stack().value_counts().to_frame()
.reset_index().rename(columns={'index': 'Combination', 0:'Count'})
print(counts_df)
The output:
Combination Count
0 (A, B) 2
1 (A, C) 1
2 (A, D) 1
3 (C, D) 1
add a comment
|
Another trick with itertools.combinations
function:
from itertools import combinations
import pandas as pd
test_df = ... # your df
counts_df = test_df.groupby('ID')['Product'].agg(lambda x: list(combinations(x, 2)))
.apply(pd.Series).stack().value_counts().to_frame()
.reset_index().rename(columns={'index': 'Combination', 0:'Count'})
print(counts_df)
The output:
Combination Count
0 (A, B) 2
1 (A, C) 1
2 (A, D) 1
3 (C, D) 1
add a comment
|
Another trick with itertools.combinations
function:
from itertools import combinations
import pandas as pd
test_df = ... # your df
counts_df = test_df.groupby('ID')['Product'].agg(lambda x: list(combinations(x, 2)))
.apply(pd.Series).stack().value_counts().to_frame()
.reset_index().rename(columns={'index': 'Combination', 0:'Count'})
print(counts_df)
The output:
Combination Count
0 (A, B) 2
1 (A, C) 1
2 (A, D) 1
3 (C, D) 1
Another trick with itertools.combinations
function:
from itertools import combinations
import pandas as pd
test_df = ... # your df
counts_df = test_df.groupby('ID')['Product'].agg(lambda x: list(combinations(x, 2)))
.apply(pd.Series).stack().value_counts().to_frame()
.reset_index().rename(columns={'index': 'Combination', 0:'Count'})
print(counts_df)
The output:
Combination Count
0 (A, B) 2
1 (A, C) 1
2 (A, D) 1
3 (C, D) 1
answered 8 hours ago
RomanPerekhrestRomanPerekhrest
64.3k4 gold badges22 silver badges58 bronze badges
64.3k4 gold badges22 silver badges58 bronze badges
add a comment
|
add a comment
|
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f58018049%2fcounting-most-common-combination-of-values-in-dataframe-column%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown