How to check whether a sublist exist in a huge database lists in a fast way?Partitioning time series data in...

Boot Windows from SAN

"Opusculum hoc, quamdiu vixero, doctioribus emendandum offero."?

Filling a listlineplot with a texture

Have you ever been rejected to board the plane because your passport valid less than 3 months?

Cooking Scrambled Eggs

How many birds in the bush?

Should I stick with American terminology in my English set young adult book?

Where can/should I, as a high schooler, publish a paper regarding the derivation of a formula?

Heyacrazy: Empty Space

How to maximize the drop odds of the Essences in Diablo II?

Joining lists with same elements

Semantic difference between regular and irregular 'backen'

Book with the Latin quote 'nihil superbus' meaning 'nothing above us'

Server Integrity Check CheckCommands question

Does maintaining a spell with a longer casting time count as casting a spell?

Tipa throwing error because of `implies` and `iff`. Workaround or fix?

Why did Khan ask Admiral James T. Kirk about Project Genesis?

What does "rel" in `mathrel` and `stackrel` stands for?

What stops you from using fixed income in developing countries?

When, exactly, does the Rogue Scout get to use their Skirmisher ability?

How to check whether a sublist exist in a huge database lists in a fast way?

What are the occurences of total war in the Native Americans?

How is linear momentum conserved in case of a freely falling body?

Changing JPEG to RAW to use on Lightroom?



How to check whether a sublist exist in a huge database lists in a fast way?


Partitioning time series data in sublists by their durationHow to check whether elements of a list meet a condition without generating the whole list first?Perform calculation on several sublists using MapIs there a good way to check, whether a small value produced numerically is a symbolic zero?Get sublists by pattern?Trouble with exporting data with rows and columns switchedImproving efficency in generating Tables with large numbers of valuesHow to speed up summations with many list callings?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







7












$begingroup$


I have a database for example datatest (Length[datatest]=10) :



datatest={
{52, 2, 5, 1, 5, 1, 15, 2, 13, 2, 2},
{1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1},
{2, 6, 4, 3, 8, 4, 2, 9, 4, 6, 5},
{0, 0, 0, 0, 1, 2, 2, 3, 2, 0, 83},
{, 6, 4, 6, 2, 12, 4, 8, 2, 12, 5},
{1, 1, 2, 2, 3, 3, 4, 41, 11, 12, 1},
{1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{15, 22, 40, 43, 49, 52, 58, 14, 120, 150, 305},
{32, 38, 46, 54, 64, 76, 89, 104, 122, 585, 760}
}


Now I have a list for instance: checklist={40, 43, 49, 52}



If I want to know whether the checklist is in the datatest, the easy way to do is as following:



For[iii= 1, iii<= Length[datatest], iii++, 

exist = SequenceCount[datatest[[iii]], checklist];

If[exist!=0,
... (*here i ignore the code, just for dealing with the data*)

Break[],
... (*here i ignore the code, just for dealing with the data*)
];
];


I will search datatest many time, so when the datatest is very large (Length[datatest] is large), it will take a lot of time with the above for-loop method . So I wonder whether there is a fast way to do such thing?



I think the question can be written as How to check whether a sublist exist in a large nested lists in a fast way?



Thank you very much!










share|improve this question











$endgroup$





















    7












    $begingroup$


    I have a database for example datatest (Length[datatest]=10) :



    datatest={
    {52, 2, 5, 1, 5, 1, 15, 2, 13, 2, 2},
    {1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1},
    {2, 6, 4, 3, 8, 4, 2, 9, 4, 6, 5},
    {0, 0, 0, 0, 1, 2, 2, 3, 2, 0, 83},
    {, 6, 4, 6, 2, 12, 4, 8, 2, 12, 5},
    {1, 1, 2, 2, 3, 3, 4, 41, 11, 12, 1},
    {1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
    {15, 22, 40, 43, 49, 52, 58, 14, 120, 150, 305},
    {32, 38, 46, 54, 64, 76, 89, 104, 122, 585, 760}
    }


    Now I have a list for instance: checklist={40, 43, 49, 52}



    If I want to know whether the checklist is in the datatest, the easy way to do is as following:



    For[iii= 1, iii<= Length[datatest], iii++, 

    exist = SequenceCount[datatest[[iii]], checklist];

    If[exist!=0,
    ... (*here i ignore the code, just for dealing with the data*)

    Break[],
    ... (*here i ignore the code, just for dealing with the data*)
    ];
    ];


    I will search datatest many time, so when the datatest is very large (Length[datatest] is large), it will take a lot of time with the above for-loop method . So I wonder whether there is a fast way to do such thing?



    I think the question can be written as How to check whether a sublist exist in a large nested lists in a fast way?



    Thank you very much!










    share|improve this question











    $endgroup$

















      7












      7








      7





      $begingroup$


      I have a database for example datatest (Length[datatest]=10) :



      datatest={
      {52, 2, 5, 1, 5, 1, 15, 2, 13, 2, 2},
      {1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1},
      {2, 6, 4, 3, 8, 4, 2, 9, 4, 6, 5},
      {0, 0, 0, 0, 1, 2, 2, 3, 2, 0, 83},
      {, 6, 4, 6, 2, 12, 4, 8, 2, 12, 5},
      {1, 1, 2, 2, 3, 3, 4, 41, 11, 12, 1},
      {1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
      {15, 22, 40, 43, 49, 52, 58, 14, 120, 150, 305},
      {32, 38, 46, 54, 64, 76, 89, 104, 122, 585, 760}
      }


      Now I have a list for instance: checklist={40, 43, 49, 52}



      If I want to know whether the checklist is in the datatest, the easy way to do is as following:



      For[iii= 1, iii<= Length[datatest], iii++, 

      exist = SequenceCount[datatest[[iii]], checklist];

      If[exist!=0,
      ... (*here i ignore the code, just for dealing with the data*)

      Break[],
      ... (*here i ignore the code, just for dealing with the data*)
      ];
      ];


      I will search datatest many time, so when the datatest is very large (Length[datatest] is large), it will take a lot of time with the above for-loop method . So I wonder whether there is a fast way to do such thing?



      I think the question can be written as How to check whether a sublist exist in a large nested lists in a fast way?



      Thank you very much!










      share|improve this question











      $endgroup$




      I have a database for example datatest (Length[datatest]=10) :



      datatest={
      {52, 2, 5, 1, 5, 1, 15, 2, 13, 2, 2},
      {1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1},
      {2, 6, 4, 3, 8, 4, 2, 9, 4, 6, 5},
      {0, 0, 0, 0, 1, 2, 2, 3, 2, 0, 83},
      {, 6, 4, 6, 2, 12, 4, 8, 2, 12, 5},
      {1, 1, 2, 2, 3, 3, 4, 41, 11, 12, 1},
      {1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
      {15, 22, 40, 43, 49, 52, 58, 14, 120, 150, 305},
      {32, 38, 46, 54, 64, 76, 89, 104, 122, 585, 760}
      }


      Now I have a list for instance: checklist={40, 43, 49, 52}



      If I want to know whether the checklist is in the datatest, the easy way to do is as following:



      For[iii= 1, iii<= Length[datatest], iii++, 

      exist = SequenceCount[datatest[[iii]], checklist];

      If[exist!=0,
      ... (*here i ignore the code, just for dealing with the data*)

      Break[],
      ... (*here i ignore the code, just for dealing with the data*)
      ];
      ];


      I will search datatest many time, so when the datatest is very large (Length[datatest] is large), it will take a lot of time with the above for-loop method . So I wonder whether there is a fast way to do such thing?



      I think the question can be written as How to check whether a sublist exist in a large nested lists in a fast way?



      Thank you very much!







      list-manipulation performance-tuning






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 11 hours ago









      Alexey Popkov

      39.5k4 gold badges111 silver badges273 bronze badges




      39.5k4 gold badges111 silver badges273 bronze badges










      asked 15 hours ago









      Xuemei GuXuemei Gu

      814 bronze badges




      814 bronze badges

























          2 Answers
          2






          active

          oldest

          votes


















          6













          $begingroup$

          Cases is pretty fast. Consider the case where you have 10,000 lists, each with 100 numbers. Cases can find all lists with the given subsequence in less than 0.05 seconds:



          data = RandomInteger[{0, 100}, {10000, 100}];
          checklist = {38, 3, 32, 24, 58, 8};

          Cases[data, {___, Sequence @@ checklist, ___}] // RepeatedTiming



          {0.049, {{26, 3, 93, 13, 11, 80, ... }}




          If you are only looking for the first instance of such a list, then FirstCase can be used. If you are only looking for the positions, then Position can be used, and similarly, FirstPosition can be used if you only need the first sequence that can be found.



          In case you need both the position and the list, use this:



          pos = FirstPosition[data, {___, Sequence @@ checklist, ___}];
          list = Extract[data, pos];





          share|improve this answer











          $endgroup$















          • $begingroup$
            Cool, Thank you very much! In addition, the Cases[data, {___, Sequence @@ checklist, ___}] gives the list you find and is it possible to give the first position at the same time? Use FirstPosition?
            $endgroup$
            – Xuemei Gu
            14 hours ago












          • $begingroup$
            @XuemeiGu I added an example of how to find the position of the first list and the list itself.
            $endgroup$
            – C. E.
            14 hours ago












          • $begingroup$
            Thank you very much! nice solution!
            $endgroup$
            – Xuemei Gu
            14 hours ago



















          3













          $begingroup$

          Instead of using SequenceCount in a loop, locate all possible matches of the checklist with SequencePosition in the flattened dataset in one go:



          exist = SequencePosition[Flatten@datatest, checklist] // 
          Quotient[(# - 1), Last@Dimensions@datatest] & //
          AnyTrue[Apply[SameQ]]


          Because a false match may occur in the flattened dataset, we only consider matches where the beginning index and the end index of the match lie in the same row.






          share|improve this answer









          $endgroup$


















            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "387"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f204290%2fhow-to-check-whether-a-sublist-exist-in-a-huge-database-lists-in-a-fast-way%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            6













            $begingroup$

            Cases is pretty fast. Consider the case where you have 10,000 lists, each with 100 numbers. Cases can find all lists with the given subsequence in less than 0.05 seconds:



            data = RandomInteger[{0, 100}, {10000, 100}];
            checklist = {38, 3, 32, 24, 58, 8};

            Cases[data, {___, Sequence @@ checklist, ___}] // RepeatedTiming



            {0.049, {{26, 3, 93, 13, 11, 80, ... }}




            If you are only looking for the first instance of such a list, then FirstCase can be used. If you are only looking for the positions, then Position can be used, and similarly, FirstPosition can be used if you only need the first sequence that can be found.



            In case you need both the position and the list, use this:



            pos = FirstPosition[data, {___, Sequence @@ checklist, ___}];
            list = Extract[data, pos];





            share|improve this answer











            $endgroup$















            • $begingroup$
              Cool, Thank you very much! In addition, the Cases[data, {___, Sequence @@ checklist, ___}] gives the list you find and is it possible to give the first position at the same time? Use FirstPosition?
              $endgroup$
              – Xuemei Gu
              14 hours ago












            • $begingroup$
              @XuemeiGu I added an example of how to find the position of the first list and the list itself.
              $endgroup$
              – C. E.
              14 hours ago












            • $begingroup$
              Thank you very much! nice solution!
              $endgroup$
              – Xuemei Gu
              14 hours ago
















            6













            $begingroup$

            Cases is pretty fast. Consider the case where you have 10,000 lists, each with 100 numbers. Cases can find all lists with the given subsequence in less than 0.05 seconds:



            data = RandomInteger[{0, 100}, {10000, 100}];
            checklist = {38, 3, 32, 24, 58, 8};

            Cases[data, {___, Sequence @@ checklist, ___}] // RepeatedTiming



            {0.049, {{26, 3, 93, 13, 11, 80, ... }}




            If you are only looking for the first instance of such a list, then FirstCase can be used. If you are only looking for the positions, then Position can be used, and similarly, FirstPosition can be used if you only need the first sequence that can be found.



            In case you need both the position and the list, use this:



            pos = FirstPosition[data, {___, Sequence @@ checklist, ___}];
            list = Extract[data, pos];





            share|improve this answer











            $endgroup$















            • $begingroup$
              Cool, Thank you very much! In addition, the Cases[data, {___, Sequence @@ checklist, ___}] gives the list you find and is it possible to give the first position at the same time? Use FirstPosition?
              $endgroup$
              – Xuemei Gu
              14 hours ago












            • $begingroup$
              @XuemeiGu I added an example of how to find the position of the first list and the list itself.
              $endgroup$
              – C. E.
              14 hours ago












            • $begingroup$
              Thank you very much! nice solution!
              $endgroup$
              – Xuemei Gu
              14 hours ago














            6














            6










            6







            $begingroup$

            Cases is pretty fast. Consider the case where you have 10,000 lists, each with 100 numbers. Cases can find all lists with the given subsequence in less than 0.05 seconds:



            data = RandomInteger[{0, 100}, {10000, 100}];
            checklist = {38, 3, 32, 24, 58, 8};

            Cases[data, {___, Sequence @@ checklist, ___}] // RepeatedTiming



            {0.049, {{26, 3, 93, 13, 11, 80, ... }}




            If you are only looking for the first instance of such a list, then FirstCase can be used. If you are only looking for the positions, then Position can be used, and similarly, FirstPosition can be used if you only need the first sequence that can be found.



            In case you need both the position and the list, use this:



            pos = FirstPosition[data, {___, Sequence @@ checklist, ___}];
            list = Extract[data, pos];





            share|improve this answer











            $endgroup$



            Cases is pretty fast. Consider the case where you have 10,000 lists, each with 100 numbers. Cases can find all lists with the given subsequence in less than 0.05 seconds:



            data = RandomInteger[{0, 100}, {10000, 100}];
            checklist = {38, 3, 32, 24, 58, 8};

            Cases[data, {___, Sequence @@ checklist, ___}] // RepeatedTiming



            {0.049, {{26, 3, 93, 13, 11, 80, ... }}




            If you are only looking for the first instance of such a list, then FirstCase can be used. If you are only looking for the positions, then Position can be used, and similarly, FirstPosition can be used if you only need the first sequence that can be found.



            In case you need both the position and the list, use this:



            pos = FirstPosition[data, {___, Sequence @@ checklist, ___}];
            list = Extract[data, pos];






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 14 hours ago

























            answered 14 hours ago









            C. E.C. E.

            54.6k3 gold badges105 silver badges214 bronze badges




            54.6k3 gold badges105 silver badges214 bronze badges















            • $begingroup$
              Cool, Thank you very much! In addition, the Cases[data, {___, Sequence @@ checklist, ___}] gives the list you find and is it possible to give the first position at the same time? Use FirstPosition?
              $endgroup$
              – Xuemei Gu
              14 hours ago












            • $begingroup$
              @XuemeiGu I added an example of how to find the position of the first list and the list itself.
              $endgroup$
              – C. E.
              14 hours ago












            • $begingroup$
              Thank you very much! nice solution!
              $endgroup$
              – Xuemei Gu
              14 hours ago


















            • $begingroup$
              Cool, Thank you very much! In addition, the Cases[data, {___, Sequence @@ checklist, ___}] gives the list you find and is it possible to give the first position at the same time? Use FirstPosition?
              $endgroup$
              – Xuemei Gu
              14 hours ago












            • $begingroup$
              @XuemeiGu I added an example of how to find the position of the first list and the list itself.
              $endgroup$
              – C. E.
              14 hours ago












            • $begingroup$
              Thank you very much! nice solution!
              $endgroup$
              – Xuemei Gu
              14 hours ago
















            $begingroup$
            Cool, Thank you very much! In addition, the Cases[data, {___, Sequence @@ checklist, ___}] gives the list you find and is it possible to give the first position at the same time? Use FirstPosition?
            $endgroup$
            – Xuemei Gu
            14 hours ago






            $begingroup$
            Cool, Thank you very much! In addition, the Cases[data, {___, Sequence @@ checklist, ___}] gives the list you find and is it possible to give the first position at the same time? Use FirstPosition?
            $endgroup$
            – Xuemei Gu
            14 hours ago














            $begingroup$
            @XuemeiGu I added an example of how to find the position of the first list and the list itself.
            $endgroup$
            – C. E.
            14 hours ago






            $begingroup$
            @XuemeiGu I added an example of how to find the position of the first list and the list itself.
            $endgroup$
            – C. E.
            14 hours ago














            $begingroup$
            Thank you very much! nice solution!
            $endgroup$
            – Xuemei Gu
            14 hours ago




            $begingroup$
            Thank you very much! nice solution!
            $endgroup$
            – Xuemei Gu
            14 hours ago













            3













            $begingroup$

            Instead of using SequenceCount in a loop, locate all possible matches of the checklist with SequencePosition in the flattened dataset in one go:



            exist = SequencePosition[Flatten@datatest, checklist] // 
            Quotient[(# - 1), Last@Dimensions@datatest] & //
            AnyTrue[Apply[SameQ]]


            Because a false match may occur in the flattened dataset, we only consider matches where the beginning index and the end index of the match lie in the same row.






            share|improve this answer









            $endgroup$




















              3













              $begingroup$

              Instead of using SequenceCount in a loop, locate all possible matches of the checklist with SequencePosition in the flattened dataset in one go:



              exist = SequencePosition[Flatten@datatest, checklist] // 
              Quotient[(# - 1), Last@Dimensions@datatest] & //
              AnyTrue[Apply[SameQ]]


              Because a false match may occur in the flattened dataset, we only consider matches where the beginning index and the end index of the match lie in the same row.






              share|improve this answer









              $endgroup$


















                3














                3










                3







                $begingroup$

                Instead of using SequenceCount in a loop, locate all possible matches of the checklist with SequencePosition in the flattened dataset in one go:



                exist = SequencePosition[Flatten@datatest, checklist] // 
                Quotient[(# - 1), Last@Dimensions@datatest] & //
                AnyTrue[Apply[SameQ]]


                Because a false match may occur in the flattened dataset, we only consider matches where the beginning index and the end index of the match lie in the same row.






                share|improve this answer









                $endgroup$



                Instead of using SequenceCount in a loop, locate all possible matches of the checklist with SequencePosition in the flattened dataset in one go:



                exist = SequencePosition[Flatten@datatest, checklist] // 
                Quotient[(# - 1), Last@Dimensions@datatest] & //
                AnyTrue[Apply[SameQ]]


                Because a false match may occur in the flattened dataset, we only consider matches where the beginning index and the end index of the match lie in the same row.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered 14 hours ago









                sakrasakra

                3,08314 silver badges29 bronze badges




                3,08314 silver badges29 bronze badges

































                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Mathematica Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f204290%2fhow-to-check-whether-a-sublist-exist-in-a-huge-database-lists-in-a-fast-way%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Taj Mahal Inhaltsverzeichnis Aufbau | Geschichte | 350-Jahr-Feier | Heutige Bedeutung | Siehe auch |...

                    Baia Sprie Cuprins Etimologie | Istorie | Demografie | Politică și administrație | Arii naturale...

                    Nicolae Petrescu-Găină Cuprins Biografie | Opera | In memoriam | Varia | Controverse, incertitudini...