How to check whether a sublist exist in a huge database lists in a fast way?Partitioning time series data in...

Boot Windows from SAN

"Opusculum hoc, quamdiu vixero, doctioribus emendandum offero."?

Filling a listlineplot with a texture

Have you ever been rejected to board the plane because your passport valid less than 3 months?

Cooking Scrambled Eggs

How many birds in the bush?

Should I stick with American terminology in my English set young adult book?

Where can/should I, as a high schooler, publish a paper regarding the derivation of a formula?

Heyacrazy: Empty Space

How to maximize the drop odds of the Essences in Diablo II?

Joining lists with same elements

Semantic difference between regular and irregular 'backen'

Book with the Latin quote 'nihil superbus' meaning 'nothing above us'

Server Integrity Check CheckCommands question

Does maintaining a spell with a longer casting time count as casting a spell?

Tipa throwing error because of `implies` and `iff`. Workaround or fix?

Why did Khan ask Admiral James T. Kirk about Project Genesis?

What does "rel" in `mathrel` and `stackrel` stands for?

What stops you from using fixed income in developing countries?

When, exactly, does the Rogue Scout get to use their Skirmisher ability?

How to check whether a sublist exist in a huge database lists in a fast way?

What are the occurences of total war in the Native Americans?

How is linear momentum conserved in case of a freely falling body?

Changing JPEG to RAW to use on Lightroom?

How to check whether a sublist exist in a huge database lists in a fast way?

Partitioning time series data in sublists by their durationHow to check whether elements of a list meet a condition without generating the whole list first?Perform calculation on several sublists using MapIs there a good way to check, whether a small value produced numerically is a symbolic zero?Get sublists by pattern?Trouble with exporting data with rows and columns switchedImproving efficency in generating Tables with large numbers of valuesHow to speed up summations with many list callings?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}

I have a database for example datatest (Length[datatest]=10) :

datatest={

{52, 2, 5, 1, 5, 1, 15, 2, 13, 2, 2}, 

{1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1}, 

{2, 6, 4, 3, 8, 4, 2, 9, 4, 6, 5}, 

{0, 0, 0, 0, 1, 2, 2, 3, 2, 0, 83},

{, 6, 4, 6, 2, 12, 4, 8, 2, 12, 5}, 

{1, 1, 2, 2, 3, 3, 4, 41, 11, 12, 1}, 

{1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 

{15, 22, 40, 43, 49, 52, 58, 14, 120, 150, 305}, 

{32, 38, 46, 54, 64, 76, 89, 104, 122, 585, 760}

}

Now I have a list for instance: checklist={40, 43, 49, 52}

If I want to know whether the checklist is in the datatest, the easy way to do is as following:

For[iii= 1, iii<= Length[datatest], iii++, 



    exist = SequenceCount[datatest[[iii]], checklist];



    If[exist!=0,

       ... (*here i ignore the code, just for dealing with the data*)



       Break[],

       ... (*here i ignore the code, just for dealing with the data*)

    ];

];

I will search datatest many time, so when the datatest is very large (Length[datatest] is large), it will take a lot of time with the above for-loop method . So I wonder whether there is a fast way to do such thing?

I think the question can be written as How to check whether a sublist exist in a large nested lists in a fast way?

Thank you very much!

edited 11 hours ago

Alexey Popkov

39.5k4 gold badges111 silver badges273 bronze badges

asked 15 hours ago

Xuemei Gu

814 bronze badges

add a comment |

I have a database for example datatest (Length[datatest]=10) :

datatest={

{52, 2, 5, 1, 5, 1, 15, 2, 13, 2, 2}, 

{1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1}, 

{2, 6, 4, 3, 8, 4, 2, 9, 4, 6, 5}, 

{0, 0, 0, 0, 1, 2, 2, 3, 2, 0, 83},

{, 6, 4, 6, 2, 12, 4, 8, 2, 12, 5}, 

{1, 1, 2, 2, 3, 3, 4, 41, 11, 12, 1}, 

{1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 

{15, 22, 40, 43, 49, 52, 58, 14, 120, 150, 305}, 

{32, 38, 46, 54, 64, 76, 89, 104, 122, 585, 760}

}

Now I have a list for instance: checklist={40, 43, 49, 52}

If I want to know whether the checklist is in the datatest, the easy way to do is as following:

For[iii= 1, iii<= Length[datatest], iii++, 



    exist = SequenceCount[datatest[[iii]], checklist];



    If[exist!=0,

       ... (*here i ignore the code, just for dealing with the data*)



       Break[],

       ... (*here i ignore the code, just for dealing with the data*)

    ];

];

I think the question can be written as How to check whether a sublist exist in a large nested lists in a fast way?

Thank you very much!

edited 11 hours ago

Alexey Popkov

39.5k4 gold badges111 silver badges273 bronze badges

asked 15 hours ago

Xuemei Gu

814 bronze badges

add a comment |

I have a database for example datatest (Length[datatest]=10) :

datatest={

{52, 2, 5, 1, 5, 1, 15, 2, 13, 2, 2}, 

{1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1}, 

{2, 6, 4, 3, 8, 4, 2, 9, 4, 6, 5}, 

{0, 0, 0, 0, 1, 2, 2, 3, 2, 0, 83},

{, 6, 4, 6, 2, 12, 4, 8, 2, 12, 5}, 

{1, 1, 2, 2, 3, 3, 4, 41, 11, 12, 1}, 

{1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 

{15, 22, 40, 43, 49, 52, 58, 14, 120, 150, 305}, 

{32, 38, 46, 54, 64, 76, 89, 104, 122, 585, 760}

}

Now I have a list for instance: checklist={40, 43, 49, 52}

If I want to know whether the checklist is in the datatest, the easy way to do is as following:

For[iii= 1, iii<= Length[datatest], iii++, 



    exist = SequenceCount[datatest[[iii]], checklist];



    If[exist!=0,

       ... (*here i ignore the code, just for dealing with the data*)



       Break[],

       ... (*here i ignore the code, just for dealing with the data*)

    ];

];

I think the question can be written as How to check whether a sublist exist in a large nested lists in a fast way?

Thank you very much!

edited 11 hours ago

Alexey Popkov

39.5k4 gold badges111 silver badges273 bronze badges

asked 15 hours ago

Xuemei Gu

814 bronze badges

I have a database for example datatest (Length[datatest]=10) :

datatest={

{52, 2, 5, 1, 5, 1, 15, 2, 13, 2, 2}, 

{1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1}, 

{2, 6, 4, 3, 8, 4, 2, 9, 4, 6, 5}, 

{0, 0, 0, 0, 1, 2, 2, 3, 2, 0, 83},

{, 6, 4, 6, 2, 12, 4, 8, 2, 12, 5}, 

{1, 1, 2, 2, 3, 3, 4, 41, 11, 12, 1}, 

{1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 

{15, 22, 40, 43, 49, 52, 58, 14, 120, 150, 305}, 

{32, 38, 46, 54, 64, 76, 89, 104, 122, 585, 760}

}

Now I have a list for instance: checklist={40, 43, 49, 52}

If I want to know whether the checklist is in the datatest, the easy way to do is as following:

For[iii= 1, iii<= Length[datatest], iii++, 



    exist = SequenceCount[datatest[[iii]], checklist];



    If[exist!=0,

       ... (*here i ignore the code, just for dealing with the data*)



       Break[],

       ... (*here i ignore the code, just for dealing with the data*)

    ];

];

I think the question can be written as How to check whether a sublist exist in a large nested lists in a fast way?

Thank you very much!

list-manipulation performance-tuning

edited 11 hours ago

Alexey Popkov

39.5k4 gold badges111 silver badges273 bronze badges

asked 15 hours ago

Xuemei Gu

814 bronze badges

edited 11 hours ago

Alexey Popkov

39.5k4 gold badges111 silver badges273 bronze badges

asked 15 hours ago

Xuemei Gu

814 bronze badges

edited 11 hours ago

Alexey Popkov

39.5k4 gold badges111 silver badges273 bronze badges

edited 11 hours ago

Alexey Popkov

39.5k4 gold badges111 silver badges273 bronze badges

edited 11 hours ago

Alexey Popkov

39.5k4 gold badges111 silver badges273 bronze badges

asked 15 hours ago

Xuemei Gu

814 bronze badges

asked 15 hours ago

Xuemei Gu

814 bronze badges

asked 15 hours ago

Xuemei Gu

814 bronze badges

add a comment |

2 Answers
2

active

oldest

votes

Cases is pretty fast. Consider the case where you have 10,000 lists, each with 100 numbers. Cases can find all lists with the given subsequence in less than 0.05 seconds:

data = RandomInteger[{0, 100}, {10000, 100}];

checklist = {38, 3, 32, 24, 58, 8};



Cases[data, {___, Sequence @@ checklist, ___}] // RepeatedTiming

{0.049, {{26, 3, 93, 13, 11, 80, ... }}

If you are only looking for the first instance of such a list, then FirstCase can be used. If you are only looking for the positions, then Position can be used, and similarly, FirstPosition can be used if you only need the first sequence that can be found.

In case you need both the position and the list, use this:

pos = FirstPosition[data, {___, Sequence @@ checklist, ___}];

list = Extract[data, pos];

edited 14 hours ago

answered 14 hours ago

C. E.

54.6k3 gold badges105 silver badges214 bronze badges

$begingroup$
Cool, Thank you very much! In addition, the Cases[data, {___, Sequence @@ checklist, ___}] gives the list you find and is it possible to give the first position at the same time? Use FirstPosition?
$endgroup$
– Xuemei Gu
14 hours ago

$begingroup$
@XuemeiGu I added an example of how to find the position of the first list and the list itself.
$endgroup$
– C. E.
14 hours ago

$begingroup$
Thank you very much! nice solution!
$endgroup$
– Xuemei Gu
14 hours ago

add a comment |

Instead of using SequenceCount in a loop, locate all possible matches of the checklist with SequencePosition in the flattened dataset in one go:

exist = SequencePosition[Flatten@datatest, checklist] // 

    Quotient[(# - 1), Last@Dimensions@datatest] & // 

    AnyTrue[Apply[SameQ]]

Because a false match may occur in the flattened dataset, we only consider matches where the beginning index and the end index of the match lie in the same row.

answered 14 hours ago

sakra

3,08314 silver badges29 bronze badges

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "387"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f204290%2fhow-to-check-whether-a-sublist-exist-in-a-huge-database-lists-in-a-fast-way%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Cases is pretty fast. Consider the case where you have 10,000 lists, each with 100 numbers. Cases can find all lists with the given subsequence in less than 0.05 seconds:

data = RandomInteger[{0, 100}, {10000, 100}];

checklist = {38, 3, 32, 24, 58, 8};



Cases[data, {___, Sequence @@ checklist, ___}] // RepeatedTiming

{0.049, {{26, 3, 93, 13, 11, 80, ... }}

In case you need both the position and the list, use this:

pos = FirstPosition[data, {___, Sequence @@ checklist, ___}];

list = Extract[data, pos];

edited 14 hours ago

answered 14 hours ago

C. E.

54.6k3 gold badges105 silver badges214 bronze badges

$begingroup$
Cool, Thank you very much! In addition, the Cases[data, {___, Sequence @@ checklist, ___}] gives the list you find and is it possible to give the first position at the same time? Use FirstPosition?
$endgroup$
– Xuemei Gu
14 hours ago

$begingroup$
@XuemeiGu I added an example of how to find the position of the first list and the list itself.
$endgroup$
– C. E.
14 hours ago

$begingroup$
Thank you very much! nice solution!
$endgroup$
– Xuemei Gu
14 hours ago

add a comment |

Cases is pretty fast. Consider the case where you have 10,000 lists, each with 100 numbers. Cases can find all lists with the given subsequence in less than 0.05 seconds:

data = RandomInteger[{0, 100}, {10000, 100}];

checklist = {38, 3, 32, 24, 58, 8};



Cases[data, {___, Sequence @@ checklist, ___}] // RepeatedTiming

{0.049, {{26, 3, 93, 13, 11, 80, ... }}

In case you need both the position and the list, use this:

pos = FirstPosition[data, {___, Sequence @@ checklist, ___}];

list = Extract[data, pos];

edited 14 hours ago

answered 14 hours ago

C. E.

54.6k3 gold badges105 silver badges214 bronze badges

$begingroup$
Cool, Thank you very much! In addition, the Cases[data, {___, Sequence @@ checklist, ___}] gives the list you find and is it possible to give the first position at the same time? Use FirstPosition?
$endgroup$
– Xuemei Gu
14 hours ago

$begingroup$
@XuemeiGu I added an example of how to find the position of the first list and the list itself.
$endgroup$
– C. E.
14 hours ago

$begingroup$
Thank you very much! nice solution!
$endgroup$
– Xuemei Gu
14 hours ago

add a comment |

Cases is pretty fast. Consider the case where you have 10,000 lists, each with 100 numbers. Cases can find all lists with the given subsequence in less than 0.05 seconds:

data = RandomInteger[{0, 100}, {10000, 100}];

checklist = {38, 3, 32, 24, 58, 8};



Cases[data, {___, Sequence @@ checklist, ___}] // RepeatedTiming

{0.049, {{26, 3, 93, 13, 11, 80, ... }}

In case you need both the position and the list, use this:

pos = FirstPosition[data, {___, Sequence @@ checklist, ___}];

list = Extract[data, pos];

edited 14 hours ago

answered 14 hours ago

C. E.

54.6k3 gold badges105 silver badges214 bronze badges

Cases is pretty fast. Consider the case where you have 10,000 lists, each with 100 numbers. Cases can find all lists with the given subsequence in less than 0.05 seconds:

data = RandomInteger[{0, 100}, {10000, 100}];

checklist = {38, 3, 32, 24, 58, 8};



Cases[data, {___, Sequence @@ checklist, ___}] // RepeatedTiming

{0.049, {{26, 3, 93, 13, 11, 80, ... }}

In case you need both the position and the list, use this:

pos = FirstPosition[data, {___, Sequence @@ checklist, ___}];

list = Extract[data, pos];

edited 14 hours ago

answered 14 hours ago

C. E.

54.6k3 gold badges105 silver badges214 bronze badges

edited 14 hours ago

answered 14 hours ago

C. E.

54.6k3 gold badges105 silver badges214 bronze badges

answered 14 hours ago

C. E.

54.6k3 gold badges105 silver badges214 bronze badges

answered 14 hours ago

C. E.

54.6k3 gold badges105 silver badges214 bronze badges

$begingroup$
Cool, Thank you very much! In addition, the Cases[data, {___, Sequence @@ checklist, ___}] gives the list you find and is it possible to give the first position at the same time? Use FirstPosition?
$endgroup$
– Xuemei Gu
14 hours ago

$begingroup$
@XuemeiGu I added an example of how to find the position of the first list and the list itself.
$endgroup$
– C. E.
14 hours ago

$begingroup$
Thank you very much! nice solution!
$endgroup$
– Xuemei Gu
14 hours ago

add a comment |

$begingroup$
Cool, Thank you very much! In addition, the Cases[data, {___, Sequence @@ checklist, ___}] gives the list you find and is it possible to give the first position at the same time? Use FirstPosition?
$endgroup$
– Xuemei Gu
14 hours ago

$begingroup$
@XuemeiGu I added an example of how to find the position of the first list and the list itself.
$endgroup$
– C. E.
14 hours ago

$begingroup$
Thank you very much! nice solution!
$endgroup$
– Xuemei Gu
14 hours ago

Cool, Thank you very much! In addition, the Cases[data, {___, Sequence @@ checklist, ___}] gives the list you find and is it possible to give the first position at the same time? Use FirstPosition?

– Xuemei Gu
14 hours ago

@XuemeiGu I added an example of how to find the position of the first list and the list itself.

– C. E.
14 hours ago

Thank you very much! nice solution!

– Xuemei Gu
14 hours ago

add a comment |

Instead of using SequenceCount in a loop, locate all possible matches of the checklist with SequencePosition in the flattened dataset in one go:

exist = SequencePosition[Flatten@datatest, checklist] // 

    Quotient[(# - 1), Last@Dimensions@datatest] & // 

    AnyTrue[Apply[SameQ]]

Because a false match may occur in the flattened dataset, we only consider matches where the beginning index and the end index of the match lie in the same row.

answered 14 hours ago

sakra

3,08314 silver badges29 bronze badges

add a comment |

Instead of using SequenceCount in a loop, locate all possible matches of the checklist with SequencePosition in the flattened dataset in one go:

exist = SequencePosition[Flatten@datatest, checklist] // 

    Quotient[(# - 1), Last@Dimensions@datatest] & // 

    AnyTrue[Apply[SameQ]]

Because a false match may occur in the flattened dataset, we only consider matches where the beginning index and the end index of the match lie in the same row.

answered 14 hours ago

sakra

3,08314 silver badges29 bronze badges

add a comment |

Instead of using SequenceCount in a loop, locate all possible matches of the checklist with SequencePosition in the flattened dataset in one go:

exist = SequencePosition[Flatten@datatest, checklist] // 

    Quotient[(# - 1), Last@Dimensions@datatest] & // 

    AnyTrue[Apply[SameQ]]

Because a false match may occur in the flattened dataset, we only consider matches where the beginning index and the end index of the match lie in the same row.

answered 14 hours ago

sakra

3,08314 silver badges29 bronze badges

Instead of using SequenceCount in a loop, locate all possible matches of the checklist with SequencePosition in the flattened dataset in one go:

exist = SequencePosition[Flatten@datatest, checklist] // 

    Quotient[(# - 1), Last@Dimensions@datatest] & // 

    AnyTrue[Apply[SameQ]]

Because a false match may occur in the flattened dataset, we only consider matches where the beginning index and the end index of the match lie in the same row.

answered 14 hours ago

sakra

3,08314 silver badges29 bronze badges

answered 14 hours ago

sakra

3,08314 silver badges29 bronze badges

answered 14 hours ago

sakra

3,08314 silver badges29 bronze badges

answered 14 hours ago

sakra

3,08314 silver badges29 bronze badges

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Mathematica Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Mdthbs