Suppose that you assigned everyone a 19-digit number. What is the probability that two human beings would have the same number? It is an instance of the Birthday’s paradox.

Assuming that there are 8 billion people, the probability that at least two of them end up with the same number is given by the following table:

digits | probability |
---|---|

18 | 99.9% |

19 | 96% |

20 | 27% |

21 | 3% |

22 | 0.3% |

23 | 0.03% |

24 | 0.003% |

25 | 0.0003% |

If you want the probability to be effectively zero, you should use 30 digits or so.

Daniel Lemire, "19 random digits is not enough to uniquely identify all human beings," in *Daniel Lemire's blog*, April 12, 2023.

This reminds me of an old idea for hash tables, useful esp. with complex keys.

With 128-bit hash values, if the hash function is good, the probability of an actual key conflict on the same hash value is smaller than the probability of a server getting hit by a comet 🙂 So we don’t need to do the value equality check, possibly saving a lot of time.

Alas, I never trusted a hash function enough to actually apply this in a production system.

I love your comment. See my blog post… Hashing and the Birthday paradox: a cautionary tale.

It would be nice if you added the calculation for this probability.

Have you checked out the earlier blog post I link to?

https://lemire.me/blog/2019/12/12/are-64-bit-random-identifiers-free-from-collision/

Admittedly, it depends a bit on your assumptions and how you compute things

There are 3 considerations:

Size of a hash domain:

`2^128`

Number of distinct keys that might be used in the same environment where a collision might cause a problem – here one can argue, I think something like

`2^50`

is very generous.Probability of a meteorite hitting the Earth

From 1 and 2 we can get the probability of collision of roughly 1 in

`10^9`

.Assuming a civilization-destroying meteorite hits us every 1 million years, that means that a probability of it hitting on a given day is more than 1 in

`10^9`

.😀

There’s a very recent preprint addressing exactly this problem: https://arxiv.org/pdf/2304.07109.pdf