Daniel Lemire is a computer science professor at the University of Quebec (Canada). His research is focused on software performance and data engineering. He is a techno-optimist.

Probabilities are unnecessary mathematical artifacts

While mathematically convenient, probabilities can be harmful when solving problems because hardly anyone can think correctly about them. Here is my evidence: The famous Monty Hall problem has confused people for years because it asks us a probabilistic question. To recap the problem is as follows: behind one door out of three is a treasure. You pick a door. An agent picks one of the two remaining doors and tells you that there is no treasure behind it. Thus, two doors are left, the one you picked and the one nobody picked. Which is more likely to hide the treasure? Many people think that the two remaining doors are equally likely to hide the treasure. In fact, the door that you first picked has only a probability of 1/3 to hide the treasure, leaving 2/3 for the other door. Why are people confused? I don’t think it is a difficult puzzle. I believe that it is confusing only because we frame the problem in terms of probabilities. Let us revisit this problem from a determinist point of view. Instead of asking which is more likely to hide the treasure, let us ask a more practical question, free of the probabilistic point of view. With the above scenario, we repeat the experiments for every possible initial setup (all treasure locations). Then we ask which algorithm is best: keep the first door, or offer to switch to the second door. That is, you ask people to solve the problem without any use of probabilities. I believe that far more people would arrive at the right answer in this probability-free setup. It is a purely deterministic challenge. I claim that introducing probabilities is what makes it confusing.

To make probabilities fit, we sacrifice correctness for mathematical elegance. Most textbooks prove that hash tables have expected constant-time access when hashing is universal. But universality is a probabilistic property which assumes that your hash functions are picked at random from a family of hash functions. Very few computer languages or software libraries implement hashing in this manner. Hashing is overwhelmingly deterministic. Neither your hashing nor your data is random. It is a compelling and elegant analysis, but it is not a correct model for how hash tables work. Thus, textbooks provide the wrong explanation! The correct description of a hash table appears in the Java API documentation: “constant-time performance (…) assuming the hash function disperses the elements properly among the buckets.” See? No probabilities.

20 thoughts on “Probabilities are unnecessary mathematical artifacts”

With respect to the Monty Hall problem, isn’t the use of probabilities necessary, as a result of the initial random allocation of the prize to one of three doors? If the prize is not randomly (and uniformly) allocated, then the nature of the problem changes.

I think I have to disagree quite strongly on multiple points.

First I think the Bayesian viewpoint is orthogonal to (in)determinism. I think determinism is a belief one holds about the real world. The Bayesian paradigm is just a consistent (and in my case also practical) mathematical formalism to reason about your beliefs. In the grand scale of things (i.e. barring quantum mechanics) I certainly believe in a deterministic world and don’t consider this to be against using Bayesian beliefs.

Secondly, I’m also highly skeptical of your second point. I don’t think more people would get the answer correct if you’d phrase it without using the word likely. But then again, only a real experiment can tell right?

The whole game can be organized as a purely deterministic challenge. Maybe the treasure is always behind door number 1. Maybe the user picks door 1 on Mondays, door 2 on Tuesdays, and so on.

@Itman

See my footnote where I address this very point (hardware failures).

@John

I have updated my post to remove the Bayesian/frequentist views which are not relevant.

@Jurgen

I agree that the Bayesian viewpoint is orthogonal to (in)determinism. I have updated my blog post to remove any allusion to Bayesians. It is not relevant.

I mean that probabilities are a mathematical abstraction. I don’t recommend replacing them with brute-force enumeration of all possibilities. But you can think of it in terms of frequencies. For example, you can say “this algorithm will be best in 60% of all possible problems”.

Leslie Lamport wrote an article contending that hardware failures analogous to Buridan’s Ass were actually fundamental and unavoidable components of computing digitally using an analog medium.

I expect skepticism regarding true randomness would be more common among Bayesians.

One interpretation of Bayesian statistics is that distributions represent uncertainty, and not necessarily randomness. According to this view, even if a parameter is constant, we may model it as random to reflect our uncertain knowledge of that parameter. Probability densities represent our uncertain knowledge of the world, and not necessarily physical randomness. Our uncertainty may come from physical randomness or from ignorance. Often the distinction doesn’t matter.

1. The Bayesian and Frequentist views are misrepresented:

The Bayesian view is concerned with about a model of our knowledge. This is why Bayes rule updates the probabilities based on new evidence. This makes Bayesian statistics a very handy tool for computational neuroscience.

The Frequentist views probabilities as abstractions to account for ignored hidden variables.

In other words, both models exist in a deterministic world, one focuses on our limitations based on our knowlege, and the other focuses on the limitations based on looking at a subset rather than the whole.

2. We often use probabilistic arguments for proofs because we sometimes simply don’t know how to prove anything otherwise. Examples abound. The entire field of derandomization is partially motivated by making proofs work without randomness. This also touches on hard complexity questions like BPP?=P.

3. The apparent ‘randomness’ in quantum computing is an artifact of decoherence. In fact, quantum mechanics is perfectly deterministic.

@Mohamad: Quantum mechanics is causal, but certainly not deterministic. That’s the whole point of the Aspect experiments to verify Bell’s inequalities.

Certain events are truly random, but that doesn’t make probabilities first class citizens of our universe any more than the real number giving the height of my dog is a first class citizen of the universe.

Monty Haul confuses students because they haven’t yet internalized the importance of knowing what sigma algebra you’re measuring. It’s chosen to bring out exactly that problem. The student will arrive at a useful understanding by resolving the confusion. However, I think your approach of telling students to demonstrate which algorithm is better is a very good way to get them moving. It will naturally lead them through probabilities, though.

For your second point, the fact that the textbook provides an analysis which is irrelevant to most real world cases is hardly the fault of probability any more than the irrelevance of penguins sliding across frictionless ice in introductory physics is the fault of calculus.

1. The Monty Hall problem is designed to trip people who have shallow understanding of probability; therefore it’s not surprising that it does.

2. As soon as software gets to a reasonable level of complexity, devising a test battery without an understanding of the likelihood of errors in its component parts is a futile exercise. Any complex piece of software will have more failure points than can ever be tested.

3. Software that manages communications, for example, needs to represent expected response times or failure rates of the network components it’s managing. After some comedic attempts at circumventing probabilities in the early 80s, almost all this representation is probabilistic.

I agree that computers are basically deterministic. But I don’t see that probability being unnecessary follows from that. Computers can still be unpredictable, so the choice can still turn out to be “say something about probability, or don’t say anything at all”. Even if in principal you could run every possible input to an algorithm up to some limit, that doesn’t mean the computing time to do so actually exists.

And if we’re willing to allow that certain physical phenomena may be non-determinstic, doesn’t it seem likely that there’s at least a hint of non-determinism in the human giving input to the computer? I know we do things like say “7” a lot when asked for random numbers and underestimate the odds of runs of consecutive numbers, but even if we’re biased, the choice between “6” and “7” could still be non-determinstic.

Just to keep we pedants happy, please change “dice” to “die” when it is used as a singular noun. I don’t understand why this seems so hard for most people to remember.

Most finite probabilistic problems such as the one you describe can be solved by combinatorics. They are just “counting” problems.

This being said, I don’t have a problem with probabilities as a branch of mathematics. I use probabilities extensively as a researcher.

What bothers me is the unnecessary introduction of probabilities into a problem purely for the sake of elegance, at the expense of legibility and correctness.

@Mark

Thanks. I do have a valid excuse: I write in English as a second language.

Probabilities are useful mathematical tools, and everybody should know how to use them. The claim they are not “first-class” is practically irrelevant, like claiming that algebra is not “first-class” because algebraic variables aren’t the sort of variables used in software. In contrast, “random” is a semantically empty term used to sell the philosophy of science dust-jackets at Barnes&Noble.

The damage is done when “random” and “probability” are conflated. This is reminiscent of calling those things in computer programs “variables”. One of my favorite quotes from grad school: “Random variables are neither random nor variable — they are functions.” Not knowing how to use random variables is akin to not knowing how to use functions.

Please solve the following puzzle with an algorithm, kindly refraining from any overt reliance on the maths of probability, just as you so ably demonstrated with the Monty Hall puzzle: Grab a bag of M&M candies and a penny. Sit in a circle with N of your friends. Eat an M&M, flip the coin, on heads pass the bag left and on tails pass it right — repeat until only one remaining person has not gotten any candy. For each of your N friends, what is the probability they will be the last to get some?

With regards to the Monty Hall problem, there is another layer of possibly misleading abstraction that nobody here has raised.

The original Monty Hall puzzle was framed as a story where the “agent” is the host of a game-show. Given that, I would think that arriving at the “correct” answer would involve asking yourself questions like these:

1) Does the gameshow host know where the treasure is hidden?

2) If he does know where the treasure is hidden, is he allowed to use that knowledge to pick which door to open for me, or does he have to randomly choose between the two remaining doors?

3) If the gameshow host is allowed to use his knowledge of where the treasure is, then is currently motivating him? Does he want me to win, in order to get the audience excited, or does he want me to loose, in order to get the audience to feel sympathy for me?

etc…

Most people in the street would focus on those fuzzy, psychology-related kinds of questions, as opposed to the neatly mathematical bits of the puzzle.

And in real-life, that’s probably the right thing to do.

With respect to the Monty Hall problem, isn’t the use of probabilities necessary, as a result of the initial random allocation of the prize to one of three doors? If the prize is not randomly (and uniformly) allocated, then the nature of the problem changes.

Hi Daniel,

I think I have to disagree quite strongly on multiple points.

First I think the Bayesian viewpoint is orthogonal to (in)determinism. I think determinism is a belief one holds about the real world. The Bayesian paradigm is just a consistent (and in my case also practical) mathematical formalism to reason about your beliefs. In the grand scale of things (i.e. barring quantum mechanics) I certainly believe in a deterministic world and don’t consider this to be against using Bayesian beliefs.

Secondly, I’m also highly skeptical of your second point. I don’t think more people would get the answer correct if you’d phrase it without using the word likely. But then again, only a real experiment can tell right?

Love the blog!

@Dean

The whole game can be organized as a purely deterministic challenge. Maybe the treasure is always behind door number 1. Maybe the user picks door 1 on Mondays, door 2 on Tuesdays, and so on.

@Itman

See my footnote where I address this very point (hardware failures).

@John

I have updated my post to remove the Bayesian/frequentist views which are not relevant.

@Jurgen

I agree that the Bayesian viewpoint is orthogonal to (in)determinism. I have updated my blog post to remove any allusion to Bayesians. It is not relevant.

@Mohamad

I agree that I misrepresented the frequentists and Bayesians. I have removed this discussion from my blog post.

I did not write (or believe) that quantum mechanics was indeterministic, I wrote that quantum computers were considered indeterministic.

@Paul

I mean that probabilities are a mathematical abstraction. I don’t recommend replacing them with brute-force enumeration of all possibilities. But you can think of it in terms of frequencies. For example, you can say “this algorithm will be best in 60% of all possible problems”.

Leslie Lamport wrote an article contending that hardware failures analogous to Buridan’s Ass were actually fundamental and unavoidable components of computing digitally using an analog medium.

http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#buridan

When Buridan’s Ass rears its head* it seems like the resolution will be functionally random.

* – This metaphor went a bit wrong, but “ass rears its head” was too good to delete.

Computers are not deterministic. Just consider the ratio of memory failures.

I expect skepticism regarding true randomness would be more common among Bayesians.

One interpretation of Bayesian statistics is that distributions represent

uncertainty, and not necessarilyrandomness. According to this view, even if a parameter is constant, we may model it as random to reflect our uncertain knowledge of that parameter. Probability densities represent our uncertain knowledge of the world, and not necessarily physical randomness. Our uncertainty may come from physical randomness or from ignorance. Often the distinction doesn’t matter.Some comments on your post…

1. The Bayesian and Frequentist views are misrepresented:

The Bayesian view is concerned with about a model of our knowledge. This is why Bayes rule updates the probabilities based on new evidence. This makes Bayesian statistics a very handy tool for computational neuroscience.

The Frequentist views probabilities as abstractions to account for ignored hidden variables.

In other words, both models exist in a deterministic world, one focuses on our limitations based on our knowlege, and the other focuses on the limitations based on looking at a subset rather than the whole.

2. We often use probabilistic arguments for proofs because we sometimes simply don’t know how to prove anything otherwise. Examples abound. The entire field of derandomization is partially motivated by making proofs work without randomness. This also touches on hard complexity questions like BPP?=P.

3. The apparent ‘randomness’ in quantum computing is an artifact of decoherence. In fact, quantum mechanics is perfectly deterministic.

Regards,

Mohamad

@Mohamad: Quantum mechanics is causal, but certainly not deterministic. That’s the whole point of the Aspect experiments to verify Bell’s inequalities.

Certain events are truly random, but that doesn’t make probabilities first class citizens of our universe any more than the real number giving the height of my dog is a first class citizen of the universe.

Monty Haul confuses students because they haven’t yet internalized the importance of knowing what sigma algebra you’re measuring. It’s chosen to bring out exactly that problem. The student will arrive at a useful understanding by resolving the confusion. However, I think your approach of telling students to demonstrate which algorithm is better is a very good way to get them moving. It will naturally lead them through probabilities, though.

For your second point, the fact that the textbook provides an analysis which is irrelevant to most real world cases is hardly the fault of probability any more than the irrelevance of penguins sliding across frictionless ice in introductory physics is the fault of calculus.

A trio of points:

1. The Monty Hall problem is designed to trip people who have shallow understanding of probability; therefore it’s not surprising that it does.

2. As soon as software gets to a reasonable level of complexity, devising a test battery without an understanding of the likelihood of errors in its component parts is a futile exercise. Any complex piece of software will have more failure points than can ever be tested.

3. Software that manages communications, for example, needs to represent expected response times or failure rates of the network components it’s managing. After some comedic attempts at circumventing probabilities in the early 80s, almost all this representation is probabilistic.

Cheers,

JCS

I agree that computers are basically deterministic. But I don’t see that probability being unnecessary follows from that. Computers can still be unpredictable, so the choice can still turn out to be “say something about probability, or don’t say anything at all”. Even if in principal you could run every possible input to an algorithm up to some limit, that doesn’t mean the computing time to do so actually exists.

And if we’re willing to allow that certain physical phenomena may be non-determinstic, doesn’t it seem likely that there’s at least a hint of non-determinism in the human giving input to the computer? I know we do things like say “7” a lot when asked for random numbers and underestimate the odds of runs of consecutive numbers, but even if we’re biased, the choice between “6” and “7” could still be non-determinstic.

Just to keep we pedants happy, please change “dice” to “die” when it is used as a singular noun. I don’t understand why this seems so hard for most people to remember.

@Carr

Most finite probabilistic problems such as the one you describe can be solved by combinatorics. They are just “counting” problems.

This being said, I don’t have a problem with probabilities as a branch of mathematics. I use probabilities extensively as a researcher.

What bothers me is the unnecessary introduction of probabilities into a problem purely for the sake of elegance, at the expense of legibility and correctness.

@Mark

Thanks. I do have a valid excuse: I write in English as a second language.

Probabilities are useful mathematical tools, and everybody should know how to use them. The claim they are not “first-class” is practically irrelevant, like claiming that algebra is not “first-class” because algebraic variables aren’t the sort of variables used in software. In contrast, “random” is a semantically empty term used to sell the philosophy of science dust-jackets at Barnes&Noble.

The damage is done when “random” and “probability” are conflated. This is reminiscent of calling those things in computer programs “variables”. One of my favorite quotes from grad school: “Random variables are neither random nor variable — they are functions.” Not knowing how to use random variables is akin to not knowing how to use functions.

Please solve the following puzzle with an algorithm, kindly refraining from any overt reliance on the maths of probability, just as you so ably demonstrated with the Monty Hall puzzle: Grab a bag of M&M candies and a penny. Sit in a circle with N of your friends. Eat an M&M, flip the coin, on heads pass the bag left and on tails pass it right — repeat until only one remaining person has not gotten any candy. For each of your N friends, what is the probability they will be the last to get some?

With regards to the Monty Hall problem, there is another layer of possibly misleading abstraction that nobody here has raised.

The original Monty Hall puzzle was framed as a story where the “agent” is the host of a game-show. Given that, I would think that arriving at the “correct” answer would involve asking yourself questions like these:

1) Does the gameshow host know where the treasure is hidden?

2) If he does know where the treasure is hidden, is he allowed to use that knowledge to pick which door to open for me, or does he have to randomly choose between the two remaining doors?

3) If the gameshow host is allowed to use his knowledge of where the treasure is, then is currently motivating him? Does he want me to win, in order to get the audience excited, or does he want me to loose, in order to get the audience to feel sympathy for me?

etc…

Most people in the street would focus on those fuzzy, psychology-related kinds of questions, as opposed to the neatly mathematical bits of the puzzle.

And in real-life, that’s probably the right thing to do.

@Mark

That should be “Just to keep *us* pedants happy” š

Determinism is what I miss the most since coming to Google š

http://yaroslavvb.blogspot.com/2011/06/embracing-non-determinism.html

Computers are, for all pratical purposes, deterministic, but the world around them is not. Probabilistic reasoning is hugely important in certain fields such as robotics. For example, take a look at the first few chapters of Probabilistic Robotics (http://www.amazon.com/Probabilistic-Robotics-Intelligent-Autonomous-Agents/dp/0262201623).

1. Keeping focused on conditional probability makes the proper answer come out.

2. The elaboration of all 9 cases hides the assignment of equal probabilities to each case, an assumption which would benefit from being explicit.