Better guard your server farm against sharks (Or how unfathomable rare GUID collisions are)

2019-05-10

At my work, I have made a standing bet with some people that in our systems, we will never ever see the same GUID generated twice. I am so confident about this that I wanted to bet a zillion euro, but seeing that using a fake amount of money is not really a useful wager, we settled for a crate of beer instead. Why my confidence in this bet? Because I do not use my gut feeling, but statistics!

GUID?

First things first: A GUID is an acronym for "Globally Unique Identifier", a sort of bar-code used to tag something digitally. A GUID is a string of 36 random characters, where each character can be one of 16 possible characters in the range 0-9 and A-F, like this:

  • 2386d36e-8a21-4c4c-868d-21f447628c41 or
  • fd4f6378-20a1-4829-9bb4-c3e6d83d541c or
  • e7dd70c3-d891-4ae3-b3f4-81a70acd00a0 or
  • ...

I could do this literally forever. The entire idea of the GUID is that whenever you need one in your system, you just pick 36 random characters, put them in a row and be done with it. It really does not matter which 36 characters you pick, you can be sure that the specific sequence of letters you just made has never been used before in your system. I can hear you think: "How is that possible? Surely if I randomly pick one, there is a reasonable chance someone else already picked the same sequence before, right? Think of computers! They are very good at being very fast! Picking the same GUID twice must happen all the freaking time!" However, that is where you are wrong, dear reader. It is where many people are wrong and thus, every now and then, I have to defend again my position on the possibility of GUID collisions. Therefore, I will do it again in this blog post, but now for everyone to see!

The Human Brain

The reason why this the above though process happens and why it is so hard to grasp GUID’s is the limit of the computing power inside our head. Whenever any number becomes too big, we resort to putting it into context. This is why "an elephant weighs as much as a bus" and "a whale is as large as three jumbo-jets". XKCD: Monster

Zoom out a bit and we find that when "the sun is the size of a bowling ball, the earth is the size of a peppercorn. And they are about 25 meters apart!" This is already a little hard to grasp without going outside in a park and seeing how small a peppercorn is from 25 meters away. This, however, is NOTHING compared to how unimaginably small the chance of GUID collisions are!

Calculations

We can easily calculate the number of existing GUID's. It is a finite number. There are 36 spots to fill, and every spot can be one of 16 different characters. Therefore, the calculation is:

        Number of GUID's: 16 x 16 x 16 x 16 …. <repeat 32 times> x 16 = 1632 = 3,4 * 10 38
    
So: 3,4 * 10 38.

It is basically a short way to write 340.000.000.000.000.000.000.000.000.000.000.000.000. That is not a small number. In fact, it is so large; it enters into the realm of madness. It is even hard to place in context because there is nothing we can intuitively use to compare. If it were kilograms, it is the weight of 56.979.632.773.097 planet Earths. If it were millimeters, it would be 35.938.028.356.836.925.440 light-years. Which in turn would be 239.586.855.712.246 times the width of our entire galaxy. Our brains break down and I hear you saying: "But… Computers… Super fast… Gigahertz processors… Millions of calculations every second…".

Lets dive into that. We give everyone on Earth a computer, six billion in all. Every computer will do nothing but generate GUID’s at the rate of one million per second. So that is 6.000.000.000 * 1.000.000 = 6.000.000.000.000.000 GUID’s per second. At that rate, we will have them all in only 5,6*1022 seconds… Not too shabby! Until you realise this is 1.798.380.511.801.002 (or 1.7 TRILLION) years! Of course, for a collision you do not need all of them. Via the same formula as the equally non-intuitive "Birthday Paradox", you can calculate the amount of GUID’s that needs to be generated to have the next one have 50% probability of being a duplicate, is 2.71*1018 (or 2.71 quintillion). At least that will take “only” 450 years of non-stop computations by everybody in the world.

Well-known fact: Every year, five people die because of shark attacks. So whenever you are tasked to write code to test if your newly generated GUID already exist in the system, ask whether they first secured their server farm from sharks. They are WAY more likely to be a problem!

A GUID collission will just not happen. Period.

References