Suppose that you assigned everyone a 19-digit number. What is the probability that two human beings would have the same number? It is an instance of the Birthday’s paradox.
Assuming that there are 8 billion people, the probability that at least two of them end up with the same number is given by the following table:
digits | probability |
---|---|
18 | 99.9% |
19 | 96% |
20 | 27% |
21 | 3% |
22 | 0.3% |
23 | 0.03% |
24 | 0.003% |
25 | 0.0003% |
If you want the probability to be effectively zero, you should use 30 digits or so.
This reminds me of an old idea for hash tables, useful esp. with complex keys.
With 128-bit hash values, if the hash function is good, the probability of an actual key conflict on the same hash value is smaller than the probability of a server getting hit by a comet 🙂 So we don’t need to do the value equality check, possibly saving a lot of time.
Alas, I never trusted a hash function enough to actually apply this in a production system.
I love your comment. See my blog post… Hashing and the Birthday paradox: a cautionary tale.
It would be nice if you added the calculation for this probability.
Have you checked out the earlier blog post I link to?
https://lemire.me/blog/2019/12/12/are-64-bit-random-identifiers-free-from-collision/
Admittedly, it depends a bit on your assumptions and how you compute things
There are 3 considerations:
Size of a hash domain:
2^128
Number of distinct keys that might be used in the same environment where a collision might cause a problem – here one can argue, I think something like
2^50
is very generous.Probability of a meteorite hitting the Earth
From 1 and 2 we can get the probability of collision of roughly 1 in
10^9
.Assuming a civilization-destroying meteorite hits us every 1 million years, that means that a probability of it hitting on a given day is more than 1 in
10^9
.😀
There’s a very recent preprint addressing exactly this problem: https://arxiv.org/pdf/2304.07109.pdf