Finding collisions of the first few bits of a SHA-1 hashUsing Levenstein distance to compare stringsReview of...

USPS Back Room - Trespassing?

Should there be an "a" before "ten years imprisonment"?

Is it possible to prohibit all prohibitable schools of magic with a single character?

How to melt snow without fire or body heat?

Security vulnerabilities of POST over SSL

Is it legal to have an abortion in another state or abroad?

Why are Stein manifolds/spaces the analog of affine varieties/schemes in algebraic geometry?

Parallel fifths in the orchestra

On San Andreas Speedruns, why do players blow up the Picador in the mission Ryder?

Python program to take in two strings and print the larger string

What does kpsewhich stand for?

Mercedes C180 (W204) dash symbol

Writing style before Elements of Style

Must a warlock replace spells with new spells of exactly their Pact Magic spell slot level?

SFDX: where can set Field-level security and accessibility?

How does the EU Emissions Trading Scheme account for increased emissions outside the EU?

How do I superimpose two math symbols?

Why isn't Tyrion mentioned in the in-universe book "A Song of Ice and Fire"?

How to deal with a colleague who is being aggressive?

How to patch glass cuts in a bicycle tire?

What is the use case for non-breathable waterproof pants?

How can I tell if I'm being too picky as a referee?

Need to read my home electrical Meter

Why was this character made Grand Maester?



Finding collisions of the first few bits of a SHA-1 hash


Using Levenstein distance to compare stringsReview of reservoir samplingKarp-Rabin with bitwise hashMinimum window substring which includes all the characters from substringOptimizing Java SHA-512 String-to-Hash GeneratorHackerEarth - SimpleFunctionLeetcode 49: Group Anagrams - Hash function design talkFollow-up 1: Compare 2 unordered, rooted trees for shape-isomorphismChecking whether one string contains all the characters of another stringFinding the lowest possible number of changes to turn one string into another






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







5












$begingroup$


My objective is to find a hash collision of my modified hash function. Assuming my modified hash only outputs the first 36 bits of SHA-1. As we know, SHA-1 is a 160-bit hash value, hence, we only need to output 9 characters out of 40 characters for comparison.



How i begin my program is by hashing a string (I have a SHA-1 algorithm running and i will name it as sha1. I have also ensured that the output of the SHA-1 algorithm is correct.)



Firstly, i would hardcode 2 string and extract out 9 characters out of 40 characters since i require only the first 36 bits of SHA-1. The function below basically return a true if a collision is found and false if a collision is not found



public static boolean findCollision(int x1, int x2) {

String message1 = "I tried " + x1 + " iteration to find a collision";
String message2 = "I tried " + x2 + " iteration to find a collision";


//hashing my string and extracting 9 characters
String message_hash1 = sha1(message1);
String message_hash2 = sha1(message2);

String modified_hash1 = message_hash1.substring(0, 9);
String modified_hash2 = message_hash2.substring(0, 9);

if (modified_hash1.equals(modified_hash2))
return true;
else
return false;
}


Lastly, i will have a main function that will random Integers up to MAX_VALUE in a infinite loop and will break out if and only if a hash is found.



public static void main(String[] args) {

Random random = new Random();
int x1 = 0;
int x2 = 0;
int counter = 0;

while (true) {

while(true){

x1 = random.nextInt(Integer.MAX_VALUE);
x2 = random.nextInt(Integer.MAX_VALUE);

if (x1 != x2)
break;
}

if (findCollision(x1, x2) == true) {
break;
}
counter++;
}

System.out.println("nNumber of trials: " + counter);
}


If i tried taking only the first 24 bits of SHA-1, i could easily find a collision. However, i'm unable to find a collision for 36 bits instead despite running it for hours. Hence, I'm wondering what is other alternative way for me to find a collision with just 36 bits of SHA-1.










share|improve this question









New contributor



Astral Zhang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$












  • $begingroup$
    Out of curiosity, why are you trying to do this? Is it because you want to see if your modified algorithm is good?
    $endgroup$
    – IEatBagels
    6 hours ago










  • $begingroup$
    of course not. this modified algorithm is never good because by decreasing the hash length is never a good thing. It's an assignment for my class work and i'm ran out of ideas on how to achieve a hash collision. that's the main objective of this assignment.
    $endgroup$
    – Astral Zhang
    6 hours ago










  • $begingroup$
    You should mention this in your question, I believe it's important :)
    $endgroup$
    – IEatBagels
    6 hours ago










  • $begingroup$
    i have done so. thanks!
    $endgroup$
    – Astral Zhang
    6 hours ago


















5












$begingroup$


My objective is to find a hash collision of my modified hash function. Assuming my modified hash only outputs the first 36 bits of SHA-1. As we know, SHA-1 is a 160-bit hash value, hence, we only need to output 9 characters out of 40 characters for comparison.



How i begin my program is by hashing a string (I have a SHA-1 algorithm running and i will name it as sha1. I have also ensured that the output of the SHA-1 algorithm is correct.)



Firstly, i would hardcode 2 string and extract out 9 characters out of 40 characters since i require only the first 36 bits of SHA-1. The function below basically return a true if a collision is found and false if a collision is not found



public static boolean findCollision(int x1, int x2) {

String message1 = "I tried " + x1 + " iteration to find a collision";
String message2 = "I tried " + x2 + " iteration to find a collision";


//hashing my string and extracting 9 characters
String message_hash1 = sha1(message1);
String message_hash2 = sha1(message2);

String modified_hash1 = message_hash1.substring(0, 9);
String modified_hash2 = message_hash2.substring(0, 9);

if (modified_hash1.equals(modified_hash2))
return true;
else
return false;
}


Lastly, i will have a main function that will random Integers up to MAX_VALUE in a infinite loop and will break out if and only if a hash is found.



public static void main(String[] args) {

Random random = new Random();
int x1 = 0;
int x2 = 0;
int counter = 0;

while (true) {

while(true){

x1 = random.nextInt(Integer.MAX_VALUE);
x2 = random.nextInt(Integer.MAX_VALUE);

if (x1 != x2)
break;
}

if (findCollision(x1, x2) == true) {
break;
}
counter++;
}

System.out.println("nNumber of trials: " + counter);
}


If i tried taking only the first 24 bits of SHA-1, i could easily find a collision. However, i'm unable to find a collision for 36 bits instead despite running it for hours. Hence, I'm wondering what is other alternative way for me to find a collision with just 36 bits of SHA-1.










share|improve this question









New contributor



Astral Zhang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$












  • $begingroup$
    Out of curiosity, why are you trying to do this? Is it because you want to see if your modified algorithm is good?
    $endgroup$
    – IEatBagels
    6 hours ago










  • $begingroup$
    of course not. this modified algorithm is never good because by decreasing the hash length is never a good thing. It's an assignment for my class work and i'm ran out of ideas on how to achieve a hash collision. that's the main objective of this assignment.
    $endgroup$
    – Astral Zhang
    6 hours ago










  • $begingroup$
    You should mention this in your question, I believe it's important :)
    $endgroup$
    – IEatBagels
    6 hours ago










  • $begingroup$
    i have done so. thanks!
    $endgroup$
    – Astral Zhang
    6 hours ago














5












5








5





$begingroup$


My objective is to find a hash collision of my modified hash function. Assuming my modified hash only outputs the first 36 bits of SHA-1. As we know, SHA-1 is a 160-bit hash value, hence, we only need to output 9 characters out of 40 characters for comparison.



How i begin my program is by hashing a string (I have a SHA-1 algorithm running and i will name it as sha1. I have also ensured that the output of the SHA-1 algorithm is correct.)



Firstly, i would hardcode 2 string and extract out 9 characters out of 40 characters since i require only the first 36 bits of SHA-1. The function below basically return a true if a collision is found and false if a collision is not found



public static boolean findCollision(int x1, int x2) {

String message1 = "I tried " + x1 + " iteration to find a collision";
String message2 = "I tried " + x2 + " iteration to find a collision";


//hashing my string and extracting 9 characters
String message_hash1 = sha1(message1);
String message_hash2 = sha1(message2);

String modified_hash1 = message_hash1.substring(0, 9);
String modified_hash2 = message_hash2.substring(0, 9);

if (modified_hash1.equals(modified_hash2))
return true;
else
return false;
}


Lastly, i will have a main function that will random Integers up to MAX_VALUE in a infinite loop and will break out if and only if a hash is found.



public static void main(String[] args) {

Random random = new Random();
int x1 = 0;
int x2 = 0;
int counter = 0;

while (true) {

while(true){

x1 = random.nextInt(Integer.MAX_VALUE);
x2 = random.nextInt(Integer.MAX_VALUE);

if (x1 != x2)
break;
}

if (findCollision(x1, x2) == true) {
break;
}
counter++;
}

System.out.println("nNumber of trials: " + counter);
}


If i tried taking only the first 24 bits of SHA-1, i could easily find a collision. However, i'm unable to find a collision for 36 bits instead despite running it for hours. Hence, I'm wondering what is other alternative way for me to find a collision with just 36 bits of SHA-1.










share|improve this question









New contributor



Astral Zhang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$




My objective is to find a hash collision of my modified hash function. Assuming my modified hash only outputs the first 36 bits of SHA-1. As we know, SHA-1 is a 160-bit hash value, hence, we only need to output 9 characters out of 40 characters for comparison.



How i begin my program is by hashing a string (I have a SHA-1 algorithm running and i will name it as sha1. I have also ensured that the output of the SHA-1 algorithm is correct.)



Firstly, i would hardcode 2 string and extract out 9 characters out of 40 characters since i require only the first 36 bits of SHA-1. The function below basically return a true if a collision is found and false if a collision is not found



public static boolean findCollision(int x1, int x2) {

String message1 = "I tried " + x1 + " iteration to find a collision";
String message2 = "I tried " + x2 + " iteration to find a collision";


//hashing my string and extracting 9 characters
String message_hash1 = sha1(message1);
String message_hash2 = sha1(message2);

String modified_hash1 = message_hash1.substring(0, 9);
String modified_hash2 = message_hash2.substring(0, 9);

if (modified_hash1.equals(modified_hash2))
return true;
else
return false;
}


Lastly, i will have a main function that will random Integers up to MAX_VALUE in a infinite loop and will break out if and only if a hash is found.



public static void main(String[] args) {

Random random = new Random();
int x1 = 0;
int x2 = 0;
int counter = 0;

while (true) {

while(true){

x1 = random.nextInt(Integer.MAX_VALUE);
x2 = random.nextInt(Integer.MAX_VALUE);

if (x1 != x2)
break;
}

if (findCollision(x1, x2) == true) {
break;
}
counter++;
}

System.out.println("nNumber of trials: " + counter);
}


If i tried taking only the first 24 bits of SHA-1, i could easily find a collision. However, i'm unable to find a collision for 36 bits instead despite running it for hours. Hence, I'm wondering what is other alternative way for me to find a collision with just 36 bits of SHA-1.







java time-limit-exceeded cryptography hashcode






share|improve this question









New contributor



Astral Zhang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.










share|improve this question









New contributor



Astral Zhang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








share|improve this question




share|improve this question








edited 5 hours ago









200_success

132k20160427




132k20160427






New contributor



Astral Zhang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








asked 7 hours ago









Astral ZhangAstral Zhang

262




262




New contributor



Astral Zhang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




New contributor




Astral Zhang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.














  • $begingroup$
    Out of curiosity, why are you trying to do this? Is it because you want to see if your modified algorithm is good?
    $endgroup$
    – IEatBagels
    6 hours ago










  • $begingroup$
    of course not. this modified algorithm is never good because by decreasing the hash length is never a good thing. It's an assignment for my class work and i'm ran out of ideas on how to achieve a hash collision. that's the main objective of this assignment.
    $endgroup$
    – Astral Zhang
    6 hours ago










  • $begingroup$
    You should mention this in your question, I believe it's important :)
    $endgroup$
    – IEatBagels
    6 hours ago










  • $begingroup$
    i have done so. thanks!
    $endgroup$
    – Astral Zhang
    6 hours ago


















  • $begingroup$
    Out of curiosity, why are you trying to do this? Is it because you want to see if your modified algorithm is good?
    $endgroup$
    – IEatBagels
    6 hours ago










  • $begingroup$
    of course not. this modified algorithm is never good because by decreasing the hash length is never a good thing. It's an assignment for my class work and i'm ran out of ideas on how to achieve a hash collision. that's the main objective of this assignment.
    $endgroup$
    – Astral Zhang
    6 hours ago










  • $begingroup$
    You should mention this in your question, I believe it's important :)
    $endgroup$
    – IEatBagels
    6 hours ago










  • $begingroup$
    i have done so. thanks!
    $endgroup$
    – Astral Zhang
    6 hours ago
















$begingroup$
Out of curiosity, why are you trying to do this? Is it because you want to see if your modified algorithm is good?
$endgroup$
– IEatBagels
6 hours ago




$begingroup$
Out of curiosity, why are you trying to do this? Is it because you want to see if your modified algorithm is good?
$endgroup$
– IEatBagels
6 hours ago












$begingroup$
of course not. this modified algorithm is never good because by decreasing the hash length is never a good thing. It's an assignment for my class work and i'm ran out of ideas on how to achieve a hash collision. that's the main objective of this assignment.
$endgroup$
– Astral Zhang
6 hours ago




$begingroup$
of course not. this modified algorithm is never good because by decreasing the hash length is never a good thing. It's an assignment for my class work and i'm ran out of ideas on how to achieve a hash collision. that's the main objective of this assignment.
$endgroup$
– Astral Zhang
6 hours ago












$begingroup$
You should mention this in your question, I believe it's important :)
$endgroup$
– IEatBagels
6 hours ago




$begingroup$
You should mention this in your question, I believe it's important :)
$endgroup$
– IEatBagels
6 hours ago












$begingroup$
i have done so. thanks!
$endgroup$
– Astral Zhang
6 hours ago




$begingroup$
i have done so. thanks!
$endgroup$
– Astral Zhang
6 hours ago










2 Answers
2






active

oldest

votes


















2












$begingroup$

With 24 bits, there are approximately 16.8 million possible hashes, so on average, you'd have to try 8.4 million pairs until you find a collision. With 36 bits, the numbers have to be multiplied by 4096, which yields 68 respectively 34 billion. You can cut this in half by not computing two hashes during each iteration, but instead computing one before the loop and keeping it constant.



However, that is probably still more time than you want to spend. One way to reduce that time is to utilize the Birthday Paradox (https://en.m.wikipedia.org/wiki/Birthday_problem). By computing a list of hashes and comparing each one with each other your chances of finding a collision are much higher. I won't try to type a complete algorithm on this phone screen, though :-)






share|improve this answer








New contributor



Hans-Martin Mosner is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





$endgroup$





















    1












    $begingroup$

    I realized that while typing this, you should look into the Birthday Attack, which will probably much more elaborated than my answer, lol.



    Considering your have 36 bits of data, it means you have a total number of possibilities of $2^{36} = 68719476736$. Based on the Pigeonhole principle, if you compared the hashed of 68719476736 different strings, you'd get a collision.



    So that's easy! All you need to do is run this (pseudocode):



    hashset = (new hashset that uses your hashing algorithm)
    for(i = 0; i < 68719476736 + 1; i++) {

    if hashset.contains(string(i)) break;

    hashset.add(string(i));
    }

    print("This took " + i + " tried!")


    With this, you're guaranteed to get a collision. But there's a problem. If every iteration took 100 milliseconds, you'd need about 217 years to get a solution. Tell your teacher your great-great-grandchildren will get back to you with a solution. You could also buy 2000 computers to run this in ~40 days.



    My algorithm is pretty much guaranteed to finish (assuming none of the computers crash in any way), yours isn't. This is more of a lesson on "how to test things", which is valuable for developers. When you want to test something, try not to be random. The thing with your algorithm is that maybe you wouldn't ever get collisions. I understand that you kind of need randomness if you want this to finish running this one day.



    So the question is, what could we do to make this better?



    We could check about how many different strings we'd need to get a good enough chance there's a collision. It's kind of the Birthday Problem which gives us an easy way to calculate what are the chances n different persons have the same birthday in a group of m persons.



    We can use the formula under Approximation of number of people and adapt it to our problem to understand that if we had 68719476736 + 1 strings in our possession (for example, every number between 1 and 68719476736 + 1), you'd need to pick 308652 of those to have about 50% chances of having a collision.



    What you could try is randomly take 308652 numbers between 1 and 68719476736 + 1 and hash them to find a collision. Repeat this as long as you don't have a collision.



    The pseudocode would look like this :



    generator = RandomBigIntGenerator()
    numbers = []

    for(i = 0; i < 308652; i++) {
    numbers.add(generator.random(1,68719476736+1))
    }

    hashset = {} (With your hashing function)
    for n in numbers {
    if hashset.contains(n) {
    print("yay done");
    break;
    }
    hashset.add(m);
    }


    All in all, you can hope to have collisions, but you need computing power.



    In a code review point of view :




    • You need to keep track of the strings you already tried, they give you valuable information that can actually help you find collision.

    • Don't be too random when you create your strings.

    • Understand that 68719476736 is a biiggg number, but still so much smaller than the 1461501637330902918203684832716283019655932542976 possible values the SHA1 have :)






    share|improve this answer









    $endgroup$













    • $begingroup$
      I will try this birthday paradox method right now. Hopefully it will work.
      $endgroup$
      – Astral Zhang
      5 hours ago










    • $begingroup$
      i now have a problem while implementing this solution. If my final output of my program has to display the two original string and it's hash values, then using hashset doesn't work because hashset doesn't allow us to store 2 datatype. From your recommended algorithm, it mentioned to store the hash value directly and compare. If it's done that way, then there's no way i can identify the number used to generate that particular collision hash.
      $endgroup$
      – Astral Zhang
      4 hours ago












    • $begingroup$
      how do you obtain this value 308652?
      $endgroup$
      – Astral Zhang
      4 hours ago












    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "196"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });






    Astral Zhang is a new contributor. Be nice, and check out our Code of Conduct.










    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f220756%2ffinding-collisions-of-the-first-few-bits-of-a-sha-1-hash%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2












    $begingroup$

    With 24 bits, there are approximately 16.8 million possible hashes, so on average, you'd have to try 8.4 million pairs until you find a collision. With 36 bits, the numbers have to be multiplied by 4096, which yields 68 respectively 34 billion. You can cut this in half by not computing two hashes during each iteration, but instead computing one before the loop and keeping it constant.



    However, that is probably still more time than you want to spend. One way to reduce that time is to utilize the Birthday Paradox (https://en.m.wikipedia.org/wiki/Birthday_problem). By computing a list of hashes and comparing each one with each other your chances of finding a collision are much higher. I won't try to type a complete algorithm on this phone screen, though :-)






    share|improve this answer








    New contributor



    Hans-Martin Mosner is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.





    $endgroup$


















      2












      $begingroup$

      With 24 bits, there are approximately 16.8 million possible hashes, so on average, you'd have to try 8.4 million pairs until you find a collision. With 36 bits, the numbers have to be multiplied by 4096, which yields 68 respectively 34 billion. You can cut this in half by not computing two hashes during each iteration, but instead computing one before the loop and keeping it constant.



      However, that is probably still more time than you want to spend. One way to reduce that time is to utilize the Birthday Paradox (https://en.m.wikipedia.org/wiki/Birthday_problem). By computing a list of hashes and comparing each one with each other your chances of finding a collision are much higher. I won't try to type a complete algorithm on this phone screen, though :-)






      share|improve this answer








      New contributor



      Hans-Martin Mosner is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      $endgroup$
















        2












        2








        2





        $begingroup$

        With 24 bits, there are approximately 16.8 million possible hashes, so on average, you'd have to try 8.4 million pairs until you find a collision. With 36 bits, the numbers have to be multiplied by 4096, which yields 68 respectively 34 billion. You can cut this in half by not computing two hashes during each iteration, but instead computing one before the loop and keeping it constant.



        However, that is probably still more time than you want to spend. One way to reduce that time is to utilize the Birthday Paradox (https://en.m.wikipedia.org/wiki/Birthday_problem). By computing a list of hashes and comparing each one with each other your chances of finding a collision are much higher. I won't try to type a complete algorithm on this phone screen, though :-)






        share|improve this answer








        New contributor



        Hans-Martin Mosner is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.





        $endgroup$



        With 24 bits, there are approximately 16.8 million possible hashes, so on average, you'd have to try 8.4 million pairs until you find a collision. With 36 bits, the numbers have to be multiplied by 4096, which yields 68 respectively 34 billion. You can cut this in half by not computing two hashes during each iteration, but instead computing one before the loop and keeping it constant.



        However, that is probably still more time than you want to spend. One way to reduce that time is to utilize the Birthday Paradox (https://en.m.wikipedia.org/wiki/Birthday_problem). By computing a list of hashes and comparing each one with each other your chances of finding a collision are much higher. I won't try to type a complete algorithm on this phone screen, though :-)







        share|improve this answer








        New contributor



        Hans-Martin Mosner is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.








        share|improve this answer



        share|improve this answer






        New contributor



        Hans-Martin Mosner is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.








        answered 5 hours ago









        Hans-Martin MosnerHans-Martin Mosner

        1212




        1212




        New contributor



        Hans-Martin Mosner is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.




        New contributor




        Hans-Martin Mosner is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.



























            1












            $begingroup$

            I realized that while typing this, you should look into the Birthday Attack, which will probably much more elaborated than my answer, lol.



            Considering your have 36 bits of data, it means you have a total number of possibilities of $2^{36} = 68719476736$. Based on the Pigeonhole principle, if you compared the hashed of 68719476736 different strings, you'd get a collision.



            So that's easy! All you need to do is run this (pseudocode):



            hashset = (new hashset that uses your hashing algorithm)
            for(i = 0; i < 68719476736 + 1; i++) {

            if hashset.contains(string(i)) break;

            hashset.add(string(i));
            }

            print("This took " + i + " tried!")


            With this, you're guaranteed to get a collision. But there's a problem. If every iteration took 100 milliseconds, you'd need about 217 years to get a solution. Tell your teacher your great-great-grandchildren will get back to you with a solution. You could also buy 2000 computers to run this in ~40 days.



            My algorithm is pretty much guaranteed to finish (assuming none of the computers crash in any way), yours isn't. This is more of a lesson on "how to test things", which is valuable for developers. When you want to test something, try not to be random. The thing with your algorithm is that maybe you wouldn't ever get collisions. I understand that you kind of need randomness if you want this to finish running this one day.



            So the question is, what could we do to make this better?



            We could check about how many different strings we'd need to get a good enough chance there's a collision. It's kind of the Birthday Problem which gives us an easy way to calculate what are the chances n different persons have the same birthday in a group of m persons.



            We can use the formula under Approximation of number of people and adapt it to our problem to understand that if we had 68719476736 + 1 strings in our possession (for example, every number between 1 and 68719476736 + 1), you'd need to pick 308652 of those to have about 50% chances of having a collision.



            What you could try is randomly take 308652 numbers between 1 and 68719476736 + 1 and hash them to find a collision. Repeat this as long as you don't have a collision.



            The pseudocode would look like this :



            generator = RandomBigIntGenerator()
            numbers = []

            for(i = 0; i < 308652; i++) {
            numbers.add(generator.random(1,68719476736+1))
            }

            hashset = {} (With your hashing function)
            for n in numbers {
            if hashset.contains(n) {
            print("yay done");
            break;
            }
            hashset.add(m);
            }


            All in all, you can hope to have collisions, but you need computing power.



            In a code review point of view :




            • You need to keep track of the strings you already tried, they give you valuable information that can actually help you find collision.

            • Don't be too random when you create your strings.

            • Understand that 68719476736 is a biiggg number, but still so much smaller than the 1461501637330902918203684832716283019655932542976 possible values the SHA1 have :)






            share|improve this answer









            $endgroup$













            • $begingroup$
              I will try this birthday paradox method right now. Hopefully it will work.
              $endgroup$
              – Astral Zhang
              5 hours ago










            • $begingroup$
              i now have a problem while implementing this solution. If my final output of my program has to display the two original string and it's hash values, then using hashset doesn't work because hashset doesn't allow us to store 2 datatype. From your recommended algorithm, it mentioned to store the hash value directly and compare. If it's done that way, then there's no way i can identify the number used to generate that particular collision hash.
              $endgroup$
              – Astral Zhang
              4 hours ago












            • $begingroup$
              how do you obtain this value 308652?
              $endgroup$
              – Astral Zhang
              4 hours ago
















            1












            $begingroup$

            I realized that while typing this, you should look into the Birthday Attack, which will probably much more elaborated than my answer, lol.



            Considering your have 36 bits of data, it means you have a total number of possibilities of $2^{36} = 68719476736$. Based on the Pigeonhole principle, if you compared the hashed of 68719476736 different strings, you'd get a collision.



            So that's easy! All you need to do is run this (pseudocode):



            hashset = (new hashset that uses your hashing algorithm)
            for(i = 0; i < 68719476736 + 1; i++) {

            if hashset.contains(string(i)) break;

            hashset.add(string(i));
            }

            print("This took " + i + " tried!")


            With this, you're guaranteed to get a collision. But there's a problem. If every iteration took 100 milliseconds, you'd need about 217 years to get a solution. Tell your teacher your great-great-grandchildren will get back to you with a solution. You could also buy 2000 computers to run this in ~40 days.



            My algorithm is pretty much guaranteed to finish (assuming none of the computers crash in any way), yours isn't. This is more of a lesson on "how to test things", which is valuable for developers. When you want to test something, try not to be random. The thing with your algorithm is that maybe you wouldn't ever get collisions. I understand that you kind of need randomness if you want this to finish running this one day.



            So the question is, what could we do to make this better?



            We could check about how many different strings we'd need to get a good enough chance there's a collision. It's kind of the Birthday Problem which gives us an easy way to calculate what are the chances n different persons have the same birthday in a group of m persons.



            We can use the formula under Approximation of number of people and adapt it to our problem to understand that if we had 68719476736 + 1 strings in our possession (for example, every number between 1 and 68719476736 + 1), you'd need to pick 308652 of those to have about 50% chances of having a collision.



            What you could try is randomly take 308652 numbers between 1 and 68719476736 + 1 and hash them to find a collision. Repeat this as long as you don't have a collision.



            The pseudocode would look like this :



            generator = RandomBigIntGenerator()
            numbers = []

            for(i = 0; i < 308652; i++) {
            numbers.add(generator.random(1,68719476736+1))
            }

            hashset = {} (With your hashing function)
            for n in numbers {
            if hashset.contains(n) {
            print("yay done");
            break;
            }
            hashset.add(m);
            }


            All in all, you can hope to have collisions, but you need computing power.



            In a code review point of view :




            • You need to keep track of the strings you already tried, they give you valuable information that can actually help you find collision.

            • Don't be too random when you create your strings.

            • Understand that 68719476736 is a biiggg number, but still so much smaller than the 1461501637330902918203684832716283019655932542976 possible values the SHA1 have :)






            share|improve this answer









            $endgroup$













            • $begingroup$
              I will try this birthday paradox method right now. Hopefully it will work.
              $endgroup$
              – Astral Zhang
              5 hours ago










            • $begingroup$
              i now have a problem while implementing this solution. If my final output of my program has to display the two original string and it's hash values, then using hashset doesn't work because hashset doesn't allow us to store 2 datatype. From your recommended algorithm, it mentioned to store the hash value directly and compare. If it's done that way, then there's no way i can identify the number used to generate that particular collision hash.
              $endgroup$
              – Astral Zhang
              4 hours ago












            • $begingroup$
              how do you obtain this value 308652?
              $endgroup$
              – Astral Zhang
              4 hours ago














            1












            1








            1





            $begingroup$

            I realized that while typing this, you should look into the Birthday Attack, which will probably much more elaborated than my answer, lol.



            Considering your have 36 bits of data, it means you have a total number of possibilities of $2^{36} = 68719476736$. Based on the Pigeonhole principle, if you compared the hashed of 68719476736 different strings, you'd get a collision.



            So that's easy! All you need to do is run this (pseudocode):



            hashset = (new hashset that uses your hashing algorithm)
            for(i = 0; i < 68719476736 + 1; i++) {

            if hashset.contains(string(i)) break;

            hashset.add(string(i));
            }

            print("This took " + i + " tried!")


            With this, you're guaranteed to get a collision. But there's a problem. If every iteration took 100 milliseconds, you'd need about 217 years to get a solution. Tell your teacher your great-great-grandchildren will get back to you with a solution. You could also buy 2000 computers to run this in ~40 days.



            My algorithm is pretty much guaranteed to finish (assuming none of the computers crash in any way), yours isn't. This is more of a lesson on "how to test things", which is valuable for developers. When you want to test something, try not to be random. The thing with your algorithm is that maybe you wouldn't ever get collisions. I understand that you kind of need randomness if you want this to finish running this one day.



            So the question is, what could we do to make this better?



            We could check about how many different strings we'd need to get a good enough chance there's a collision. It's kind of the Birthday Problem which gives us an easy way to calculate what are the chances n different persons have the same birthday in a group of m persons.



            We can use the formula under Approximation of number of people and adapt it to our problem to understand that if we had 68719476736 + 1 strings in our possession (for example, every number between 1 and 68719476736 + 1), you'd need to pick 308652 of those to have about 50% chances of having a collision.



            What you could try is randomly take 308652 numbers between 1 and 68719476736 + 1 and hash them to find a collision. Repeat this as long as you don't have a collision.



            The pseudocode would look like this :



            generator = RandomBigIntGenerator()
            numbers = []

            for(i = 0; i < 308652; i++) {
            numbers.add(generator.random(1,68719476736+1))
            }

            hashset = {} (With your hashing function)
            for n in numbers {
            if hashset.contains(n) {
            print("yay done");
            break;
            }
            hashset.add(m);
            }


            All in all, you can hope to have collisions, but you need computing power.



            In a code review point of view :




            • You need to keep track of the strings you already tried, they give you valuable information that can actually help you find collision.

            • Don't be too random when you create your strings.

            • Understand that 68719476736 is a biiggg number, but still so much smaller than the 1461501637330902918203684832716283019655932542976 possible values the SHA1 have :)






            share|improve this answer









            $endgroup$



            I realized that while typing this, you should look into the Birthday Attack, which will probably much more elaborated than my answer, lol.



            Considering your have 36 bits of data, it means you have a total number of possibilities of $2^{36} = 68719476736$. Based on the Pigeonhole principle, if you compared the hashed of 68719476736 different strings, you'd get a collision.



            So that's easy! All you need to do is run this (pseudocode):



            hashset = (new hashset that uses your hashing algorithm)
            for(i = 0; i < 68719476736 + 1; i++) {

            if hashset.contains(string(i)) break;

            hashset.add(string(i));
            }

            print("This took " + i + " tried!")


            With this, you're guaranteed to get a collision. But there's a problem. If every iteration took 100 milliseconds, you'd need about 217 years to get a solution. Tell your teacher your great-great-grandchildren will get back to you with a solution. You could also buy 2000 computers to run this in ~40 days.



            My algorithm is pretty much guaranteed to finish (assuming none of the computers crash in any way), yours isn't. This is more of a lesson on "how to test things", which is valuable for developers. When you want to test something, try not to be random. The thing with your algorithm is that maybe you wouldn't ever get collisions. I understand that you kind of need randomness if you want this to finish running this one day.



            So the question is, what could we do to make this better?



            We could check about how many different strings we'd need to get a good enough chance there's a collision. It's kind of the Birthday Problem which gives us an easy way to calculate what are the chances n different persons have the same birthday in a group of m persons.



            We can use the formula under Approximation of number of people and adapt it to our problem to understand that if we had 68719476736 + 1 strings in our possession (for example, every number between 1 and 68719476736 + 1), you'd need to pick 308652 of those to have about 50% chances of having a collision.



            What you could try is randomly take 308652 numbers between 1 and 68719476736 + 1 and hash them to find a collision. Repeat this as long as you don't have a collision.



            The pseudocode would look like this :



            generator = RandomBigIntGenerator()
            numbers = []

            for(i = 0; i < 308652; i++) {
            numbers.add(generator.random(1,68719476736+1))
            }

            hashset = {} (With your hashing function)
            for n in numbers {
            if hashset.contains(n) {
            print("yay done");
            break;
            }
            hashset.add(m);
            }


            All in all, you can hope to have collisions, but you need computing power.



            In a code review point of view :




            • You need to keep track of the strings you already tried, they give you valuable information that can actually help you find collision.

            • Don't be too random when you create your strings.

            • Understand that 68719476736 is a biiggg number, but still so much smaller than the 1461501637330902918203684832716283019655932542976 possible values the SHA1 have :)







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered 5 hours ago









            IEatBagelsIEatBagels

            9,08223579




            9,08223579












            • $begingroup$
              I will try this birthday paradox method right now. Hopefully it will work.
              $endgroup$
              – Astral Zhang
              5 hours ago










            • $begingroup$
              i now have a problem while implementing this solution. If my final output of my program has to display the two original string and it's hash values, then using hashset doesn't work because hashset doesn't allow us to store 2 datatype. From your recommended algorithm, it mentioned to store the hash value directly and compare. If it's done that way, then there's no way i can identify the number used to generate that particular collision hash.
              $endgroup$
              – Astral Zhang
              4 hours ago












            • $begingroup$
              how do you obtain this value 308652?
              $endgroup$
              – Astral Zhang
              4 hours ago


















            • $begingroup$
              I will try this birthday paradox method right now. Hopefully it will work.
              $endgroup$
              – Astral Zhang
              5 hours ago










            • $begingroup$
              i now have a problem while implementing this solution. If my final output of my program has to display the two original string and it's hash values, then using hashset doesn't work because hashset doesn't allow us to store 2 datatype. From your recommended algorithm, it mentioned to store the hash value directly and compare. If it's done that way, then there's no way i can identify the number used to generate that particular collision hash.
              $endgroup$
              – Astral Zhang
              4 hours ago












            • $begingroup$
              how do you obtain this value 308652?
              $endgroup$
              – Astral Zhang
              4 hours ago
















            $begingroup$
            I will try this birthday paradox method right now. Hopefully it will work.
            $endgroup$
            – Astral Zhang
            5 hours ago




            $begingroup$
            I will try this birthday paradox method right now. Hopefully it will work.
            $endgroup$
            – Astral Zhang
            5 hours ago












            $begingroup$
            i now have a problem while implementing this solution. If my final output of my program has to display the two original string and it's hash values, then using hashset doesn't work because hashset doesn't allow us to store 2 datatype. From your recommended algorithm, it mentioned to store the hash value directly and compare. If it's done that way, then there's no way i can identify the number used to generate that particular collision hash.
            $endgroup$
            – Astral Zhang
            4 hours ago






            $begingroup$
            i now have a problem while implementing this solution. If my final output of my program has to display the two original string and it's hash values, then using hashset doesn't work because hashset doesn't allow us to store 2 datatype. From your recommended algorithm, it mentioned to store the hash value directly and compare. If it's done that way, then there's no way i can identify the number used to generate that particular collision hash.
            $endgroup$
            – Astral Zhang
            4 hours ago














            $begingroup$
            how do you obtain this value 308652?
            $endgroup$
            – Astral Zhang
            4 hours ago




            $begingroup$
            how do you obtain this value 308652?
            $endgroup$
            – Astral Zhang
            4 hours ago










            Astral Zhang is a new contributor. Be nice, and check out our Code of Conduct.










            draft saved

            draft discarded


















            Astral Zhang is a new contributor. Be nice, and check out our Code of Conduct.













            Astral Zhang is a new contributor. Be nice, and check out our Code of Conduct.












            Astral Zhang is a new contributor. Be nice, and check out our Code of Conduct.
















            Thanks for contributing an answer to Code Review Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f220756%2ffinding-collisions-of-the-first-few-bits-of-a-sha-1-hash%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Taj Mahal Inhaltsverzeichnis Aufbau | Geschichte | 350-Jahr-Feier | Heutige Bedeutung | Siehe auch |...

            Baia Sprie Cuprins Etimologie | Istorie | Demografie | Politică și administrație | Arii naturale...

            Ciclooctatetraenă Vezi și | Bibliografie | Meniu de navigare637866text4148569-500570979m