How to check whether a sublist exist in a huge database lists in a fast way?Partitioning time series data in...
Boot Windows from SAN
"Opusculum hoc, quamdiu vixero, doctioribus emendandum offero."?
Filling a listlineplot with a texture
Have you ever been rejected to board the plane because your passport valid less than 3 months?
Cooking Scrambled Eggs
How many birds in the bush?
Should I stick with American terminology in my English set young adult book?
Where can/should I, as a high schooler, publish a paper regarding the derivation of a formula?
Heyacrazy: Empty Space
How to maximize the drop odds of the Essences in Diablo II?
Joining lists with same elements
Semantic difference between regular and irregular 'backen'
Book with the Latin quote 'nihil superbus' meaning 'nothing above us'
Server Integrity Check CheckCommands question
Does maintaining a spell with a longer casting time count as casting a spell?
Tipa throwing error because of `implies` and `iff`. Workaround or fix?
Why did Khan ask Admiral James T. Kirk about Project Genesis?
What does "rel" in `mathrel` and `stackrel` stands for?
What stops you from using fixed income in developing countries?
When, exactly, does the Rogue Scout get to use their Skirmisher ability?
How to check whether a sublist exist in a huge database lists in a fast way?
What are the occurences of total war in the Native Americans?
How is linear momentum conserved in case of a freely falling body?
Changing JPEG to RAW to use on Lightroom?
How to check whether a sublist exist in a huge database lists in a fast way?
Partitioning time series data in sublists by their durationHow to check whether elements of a list meet a condition without generating the whole list first?Perform calculation on several sublists using MapIs there a good way to check, whether a small value produced numerically is a symbolic zero?Get sublists by pattern?Trouble with exporting data with rows and columns switchedImproving efficency in generating Tables with large numbers of valuesHow to speed up summations with many list callings?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
$begingroup$
I have a database for example datatest
(Length[datatest]=10
) :
datatest={
{52, 2, 5, 1, 5, 1, 15, 2, 13, 2, 2},
{1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1},
{2, 6, 4, 3, 8, 4, 2, 9, 4, 6, 5},
{0, 0, 0, 0, 1, 2, 2, 3, 2, 0, 83},
{, 6, 4, 6, 2, 12, 4, 8, 2, 12, 5},
{1, 1, 2, 2, 3, 3, 4, 41, 11, 12, 1},
{1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{15, 22, 40, 43, 49, 52, 58, 14, 120, 150, 305},
{32, 38, 46, 54, 64, 76, 89, 104, 122, 585, 760}
}
Now I have a list for instance: checklist={40, 43, 49, 52}
If I want to know whether the checklist
is in the datatest
, the easy way to do is as following:
For[iii= 1, iii<= Length[datatest], iii++,
exist = SequenceCount[datatest[[iii]], checklist];
If[exist!=0,
... (*here i ignore the code, just for dealing with the data*)
Break[],
... (*here i ignore the code, just for dealing with the data*)
];
];
I will search datatest
many time, so when the datatest
is very large (Length[datatest]
is large), it will take a lot of time with the above for-loop method . So I wonder whether there is a fast way to do such thing?
I think the question can be written as How to check whether a sublist exist in a large nested lists in a fast way?
Thank you very much!
list-manipulation performance-tuning
$endgroup$
add a comment |
$begingroup$
I have a database for example datatest
(Length[datatest]=10
) :
datatest={
{52, 2, 5, 1, 5, 1, 15, 2, 13, 2, 2},
{1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1},
{2, 6, 4, 3, 8, 4, 2, 9, 4, 6, 5},
{0, 0, 0, 0, 1, 2, 2, 3, 2, 0, 83},
{, 6, 4, 6, 2, 12, 4, 8, 2, 12, 5},
{1, 1, 2, 2, 3, 3, 4, 41, 11, 12, 1},
{1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{15, 22, 40, 43, 49, 52, 58, 14, 120, 150, 305},
{32, 38, 46, 54, 64, 76, 89, 104, 122, 585, 760}
}
Now I have a list for instance: checklist={40, 43, 49, 52}
If I want to know whether the checklist
is in the datatest
, the easy way to do is as following:
For[iii= 1, iii<= Length[datatest], iii++,
exist = SequenceCount[datatest[[iii]], checklist];
If[exist!=0,
... (*here i ignore the code, just for dealing with the data*)
Break[],
... (*here i ignore the code, just for dealing with the data*)
];
];
I will search datatest
many time, so when the datatest
is very large (Length[datatest]
is large), it will take a lot of time with the above for-loop method . So I wonder whether there is a fast way to do such thing?
I think the question can be written as How to check whether a sublist exist in a large nested lists in a fast way?
Thank you very much!
list-manipulation performance-tuning
$endgroup$
add a comment |
$begingroup$
I have a database for example datatest
(Length[datatest]=10
) :
datatest={
{52, 2, 5, 1, 5, 1, 15, 2, 13, 2, 2},
{1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1},
{2, 6, 4, 3, 8, 4, 2, 9, 4, 6, 5},
{0, 0, 0, 0, 1, 2, 2, 3, 2, 0, 83},
{, 6, 4, 6, 2, 12, 4, 8, 2, 12, 5},
{1, 1, 2, 2, 3, 3, 4, 41, 11, 12, 1},
{1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{15, 22, 40, 43, 49, 52, 58, 14, 120, 150, 305},
{32, 38, 46, 54, 64, 76, 89, 104, 122, 585, 760}
}
Now I have a list for instance: checklist={40, 43, 49, 52}
If I want to know whether the checklist
is in the datatest
, the easy way to do is as following:
For[iii= 1, iii<= Length[datatest], iii++,
exist = SequenceCount[datatest[[iii]], checklist];
If[exist!=0,
... (*here i ignore the code, just for dealing with the data*)
Break[],
... (*here i ignore the code, just for dealing with the data*)
];
];
I will search datatest
many time, so when the datatest
is very large (Length[datatest]
is large), it will take a lot of time with the above for-loop method . So I wonder whether there is a fast way to do such thing?
I think the question can be written as How to check whether a sublist exist in a large nested lists in a fast way?
Thank you very much!
list-manipulation performance-tuning
$endgroup$
I have a database for example datatest
(Length[datatest]=10
) :
datatest={
{52, 2, 5, 1, 5, 1, 15, 2, 13, 2, 2},
{1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1},
{2, 6, 4, 3, 8, 4, 2, 9, 4, 6, 5},
{0, 0, 0, 0, 1, 2, 2, 3, 2, 0, 83},
{, 6, 4, 6, 2, 12, 4, 8, 2, 12, 5},
{1, 1, 2, 2, 3, 3, 4, 41, 11, 12, 1},
{1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{15, 22, 40, 43, 49, 52, 58, 14, 120, 150, 305},
{32, 38, 46, 54, 64, 76, 89, 104, 122, 585, 760}
}
Now I have a list for instance: checklist={40, 43, 49, 52}
If I want to know whether the checklist
is in the datatest
, the easy way to do is as following:
For[iii= 1, iii<= Length[datatest], iii++,
exist = SequenceCount[datatest[[iii]], checklist];
If[exist!=0,
... (*here i ignore the code, just for dealing with the data*)
Break[],
... (*here i ignore the code, just for dealing with the data*)
];
];
I will search datatest
many time, so when the datatest
is very large (Length[datatest]
is large), it will take a lot of time with the above for-loop method . So I wonder whether there is a fast way to do such thing?
I think the question can be written as How to check whether a sublist exist in a large nested lists in a fast way?
Thank you very much!
list-manipulation performance-tuning
list-manipulation performance-tuning
edited 11 hours ago
Alexey Popkov
39.5k4 gold badges111 silver badges273 bronze badges
39.5k4 gold badges111 silver badges273 bronze badges
asked 15 hours ago
Xuemei GuXuemei Gu
814 bronze badges
814 bronze badges
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
Cases
is pretty fast. Consider the case where you have 10,000 lists, each with 100 numbers. Cases
can find all lists with the given subsequence in less than 0.05 seconds:
data = RandomInteger[{0, 100}, {10000, 100}];
checklist = {38, 3, 32, 24, 58, 8};
Cases[data, {___, Sequence @@ checklist, ___}] // RepeatedTiming
{0.049, {{26, 3, 93, 13, 11, 80, ... }}
If you are only looking for the first instance of such a list, then FirstCase
can be used. If you are only looking for the positions, then Position
can be used, and similarly, FirstPosition
can be used if you only need the first sequence that can be found.
In case you need both the position and the list, use this:
pos = FirstPosition[data, {___, Sequence @@ checklist, ___}];
list = Extract[data, pos];
$endgroup$
$begingroup$
Cool, Thank you very much! In addition, theCases[data, {___, Sequence @@ checklist, ___}]
gives the list you find and is it possible to give the first position at the same time? UseFirstPosition
?
$endgroup$
– Xuemei Gu
14 hours ago
$begingroup$
@XuemeiGu I added an example of how to find the position of the first list and the list itself.
$endgroup$
– C. E.
14 hours ago
$begingroup$
Thank you very much! nice solution!
$endgroup$
– Xuemei Gu
14 hours ago
add a comment |
$begingroup$
Instead of using SequenceCount
in a loop, locate all possible matches of the checklist with SequencePosition in the flattened dataset in one go:
exist = SequencePosition[Flatten@datatest, checklist] //
Quotient[(# - 1), Last@Dimensions@datatest] & //
AnyTrue[Apply[SameQ]]
Because a false match may occur in the flattened dataset, we only consider matches where the beginning index and the end index of the match lie in the same row.
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "387"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f204290%2fhow-to-check-whether-a-sublist-exist-in-a-huge-database-lists-in-a-fast-way%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Cases
is pretty fast. Consider the case where you have 10,000 lists, each with 100 numbers. Cases
can find all lists with the given subsequence in less than 0.05 seconds:
data = RandomInteger[{0, 100}, {10000, 100}];
checklist = {38, 3, 32, 24, 58, 8};
Cases[data, {___, Sequence @@ checklist, ___}] // RepeatedTiming
{0.049, {{26, 3, 93, 13, 11, 80, ... }}
If you are only looking for the first instance of such a list, then FirstCase
can be used. If you are only looking for the positions, then Position
can be used, and similarly, FirstPosition
can be used if you only need the first sequence that can be found.
In case you need both the position and the list, use this:
pos = FirstPosition[data, {___, Sequence @@ checklist, ___}];
list = Extract[data, pos];
$endgroup$
$begingroup$
Cool, Thank you very much! In addition, theCases[data, {___, Sequence @@ checklist, ___}]
gives the list you find and is it possible to give the first position at the same time? UseFirstPosition
?
$endgroup$
– Xuemei Gu
14 hours ago
$begingroup$
@XuemeiGu I added an example of how to find the position of the first list and the list itself.
$endgroup$
– C. E.
14 hours ago
$begingroup$
Thank you very much! nice solution!
$endgroup$
– Xuemei Gu
14 hours ago
add a comment |
$begingroup$
Cases
is pretty fast. Consider the case where you have 10,000 lists, each with 100 numbers. Cases
can find all lists with the given subsequence in less than 0.05 seconds:
data = RandomInteger[{0, 100}, {10000, 100}];
checklist = {38, 3, 32, 24, 58, 8};
Cases[data, {___, Sequence @@ checklist, ___}] // RepeatedTiming
{0.049, {{26, 3, 93, 13, 11, 80, ... }}
If you are only looking for the first instance of such a list, then FirstCase
can be used. If you are only looking for the positions, then Position
can be used, and similarly, FirstPosition
can be used if you only need the first sequence that can be found.
In case you need both the position and the list, use this:
pos = FirstPosition[data, {___, Sequence @@ checklist, ___}];
list = Extract[data, pos];
$endgroup$
$begingroup$
Cool, Thank you very much! In addition, theCases[data, {___, Sequence @@ checklist, ___}]
gives the list you find and is it possible to give the first position at the same time? UseFirstPosition
?
$endgroup$
– Xuemei Gu
14 hours ago
$begingroup$
@XuemeiGu I added an example of how to find the position of the first list and the list itself.
$endgroup$
– C. E.
14 hours ago
$begingroup$
Thank you very much! nice solution!
$endgroup$
– Xuemei Gu
14 hours ago
add a comment |
$begingroup$
Cases
is pretty fast. Consider the case where you have 10,000 lists, each with 100 numbers. Cases
can find all lists with the given subsequence in less than 0.05 seconds:
data = RandomInteger[{0, 100}, {10000, 100}];
checklist = {38, 3, 32, 24, 58, 8};
Cases[data, {___, Sequence @@ checklist, ___}] // RepeatedTiming
{0.049, {{26, 3, 93, 13, 11, 80, ... }}
If you are only looking for the first instance of such a list, then FirstCase
can be used. If you are only looking for the positions, then Position
can be used, and similarly, FirstPosition
can be used if you only need the first sequence that can be found.
In case you need both the position and the list, use this:
pos = FirstPosition[data, {___, Sequence @@ checklist, ___}];
list = Extract[data, pos];
$endgroup$
Cases
is pretty fast. Consider the case where you have 10,000 lists, each with 100 numbers. Cases
can find all lists with the given subsequence in less than 0.05 seconds:
data = RandomInteger[{0, 100}, {10000, 100}];
checklist = {38, 3, 32, 24, 58, 8};
Cases[data, {___, Sequence @@ checklist, ___}] // RepeatedTiming
{0.049, {{26, 3, 93, 13, 11, 80, ... }}
If you are only looking for the first instance of such a list, then FirstCase
can be used. If you are only looking for the positions, then Position
can be used, and similarly, FirstPosition
can be used if you only need the first sequence that can be found.
In case you need both the position and the list, use this:
pos = FirstPosition[data, {___, Sequence @@ checklist, ___}];
list = Extract[data, pos];
edited 14 hours ago
answered 14 hours ago
C. E.C. E.
54.6k3 gold badges105 silver badges214 bronze badges
54.6k3 gold badges105 silver badges214 bronze badges
$begingroup$
Cool, Thank you very much! In addition, theCases[data, {___, Sequence @@ checklist, ___}]
gives the list you find and is it possible to give the first position at the same time? UseFirstPosition
?
$endgroup$
– Xuemei Gu
14 hours ago
$begingroup$
@XuemeiGu I added an example of how to find the position of the first list and the list itself.
$endgroup$
– C. E.
14 hours ago
$begingroup$
Thank you very much! nice solution!
$endgroup$
– Xuemei Gu
14 hours ago
add a comment |
$begingroup$
Cool, Thank you very much! In addition, theCases[data, {___, Sequence @@ checklist, ___}]
gives the list you find and is it possible to give the first position at the same time? UseFirstPosition
?
$endgroup$
– Xuemei Gu
14 hours ago
$begingroup$
@XuemeiGu I added an example of how to find the position of the first list and the list itself.
$endgroup$
– C. E.
14 hours ago
$begingroup$
Thank you very much! nice solution!
$endgroup$
– Xuemei Gu
14 hours ago
$begingroup$
Cool, Thank you very much! In addition, the
Cases[data, {___, Sequence @@ checklist, ___}]
gives the list you find and is it possible to give the first position at the same time? Use FirstPosition
?$endgroup$
– Xuemei Gu
14 hours ago
$begingroup$
Cool, Thank you very much! In addition, the
Cases[data, {___, Sequence @@ checklist, ___}]
gives the list you find and is it possible to give the first position at the same time? Use FirstPosition
?$endgroup$
– Xuemei Gu
14 hours ago
$begingroup$
@XuemeiGu I added an example of how to find the position of the first list and the list itself.
$endgroup$
– C. E.
14 hours ago
$begingroup$
@XuemeiGu I added an example of how to find the position of the first list and the list itself.
$endgroup$
– C. E.
14 hours ago
$begingroup$
Thank you very much! nice solution!
$endgroup$
– Xuemei Gu
14 hours ago
$begingroup$
Thank you very much! nice solution!
$endgroup$
– Xuemei Gu
14 hours ago
add a comment |
$begingroup$
Instead of using SequenceCount
in a loop, locate all possible matches of the checklist with SequencePosition in the flattened dataset in one go:
exist = SequencePosition[Flatten@datatest, checklist] //
Quotient[(# - 1), Last@Dimensions@datatest] & //
AnyTrue[Apply[SameQ]]
Because a false match may occur in the flattened dataset, we only consider matches where the beginning index and the end index of the match lie in the same row.
$endgroup$
add a comment |
$begingroup$
Instead of using SequenceCount
in a loop, locate all possible matches of the checklist with SequencePosition in the flattened dataset in one go:
exist = SequencePosition[Flatten@datatest, checklist] //
Quotient[(# - 1), Last@Dimensions@datatest] & //
AnyTrue[Apply[SameQ]]
Because a false match may occur in the flattened dataset, we only consider matches where the beginning index and the end index of the match lie in the same row.
$endgroup$
add a comment |
$begingroup$
Instead of using SequenceCount
in a loop, locate all possible matches of the checklist with SequencePosition in the flattened dataset in one go:
exist = SequencePosition[Flatten@datatest, checklist] //
Quotient[(# - 1), Last@Dimensions@datatest] & //
AnyTrue[Apply[SameQ]]
Because a false match may occur in the flattened dataset, we only consider matches where the beginning index and the end index of the match lie in the same row.
$endgroup$
Instead of using SequenceCount
in a loop, locate all possible matches of the checklist with SequencePosition in the flattened dataset in one go:
exist = SequencePosition[Flatten@datatest, checklist] //
Quotient[(# - 1), Last@Dimensions@datatest] & //
AnyTrue[Apply[SameQ]]
Because a false match may occur in the flattened dataset, we only consider matches where the beginning index and the end index of the match lie in the same row.
answered 14 hours ago
sakrasakra
3,08314 silver badges29 bronze badges
3,08314 silver badges29 bronze badges
add a comment |
add a comment |
Thanks for contributing an answer to Mathematica Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f204290%2fhow-to-check-whether-a-sublist-exist-in-a-huge-database-lists-in-a-fast-way%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown