There has been much philosophical debate about randomness. We should realize that we still don’t agree on an equally important question: what are probabilities? I distinguish between two dominant views:
- Determinism: There is no such thing as randomness. However, there are unpredictable events. If I throw a die, the result is predetermined, but unknowable to me. However, by symmetry arguments, I know that if I throw a fair die often enough, it will fall on any given side one time out of six. Probabilities are just abstractions: they are convenient mathematical artifacts that have no existence of their own.
- Indeterminism: Not only are some events unpredictable, they are truly random. Probabilities are first-class citizens of our universe.
Yet computers are overwhelmingly determinist1. Therefore probabilities are not first-class citizens in software. They are objectively unnecessary mathematical artifacts.
Is using unnecessary mathematical artifacts a good thing? It certainly looks good in a textbook, but it can have detrimental consequences:
- While mathematically convenient, probabilities can be harmful when solving problems. Here is my evidence: The famous Monty Hall problem has confused people for years because it asks us a probabilistic question. To recap the problem is as follows: behind one door out of three is a treasure. You pick a door. An agent picks one of the two remaining doors and tells you that there is no treasure behind it. Thus, two doors are left, the one you picked and the one nobody picked. Which is more likely to hide the treasure? Many people think that the two remaining doors are equally likely to hide the treasure. In fact, the door that you first picked has only a probability of 1/3 to hide the treasure, leaving 2/3 for the other door. Why are people confused? I don’t think it is a difficult puzzle. I believe that it is confusing only because we frame the problem in terms of probabilities. Let us revisit this problem from a determinist point of view. Instead of asking which is more likely to hide the treasure, let us ask a more practical question, free of the probabilistic point of view. With the above scenario, we repeat the experiments for every possible initial setup (all treasure locations). Then we ask which algorithm is best: keep the first door, or offer to switch to the second door. That is, you ask people to solve the problem without any use of probabilities. I believe that far more people would arrive at the right answer in this probability-free setup. It is a purely deterministic challenge. I claim that introducing probabilities is what makes it confusing.
- We sacrifice connectedness for mathematical elegance. Most textbooks prove that hash tables have expected constant-time access when hashing is universal. But universality is a probabilistic property which assumes that your hash functions are picked at random from a family of hash functions. Very few computer languages or software libraries implement hashing in this manner. Hashing is overwhelmingly deterministic. Neither your hashing nor your data is random. It is a compelling and elegant analysis, but it is not a correct model for how hash tables work. Thus, textbooks provide the wrong explanation! The correct description of a hash table appears in the Java API documentation: “constant-time performance (…) assuming the hash function disperses the elements properly among the buckets.” See? No probabilities.
Disclosure: I am a recovering indeterminist. Many of my most recent research papers have quite a bit of probabilities in them. I have been teaching probabilities to struggling Computer Science students for years.
1 There are some possible exceptions: (1) quantum computers are considered indeterminist, (2) computers have sensors which measure random physical properties such as tiny temperature variations (3) some hardware faults could be considered purely random.