Conditional probability

https://arbital.com/p/conditional_probability

by Eliezer Yudkowsky Jan 26 2016 updated Oct 8 2016

The notation for writing "The probability that someone has green eyes, if we know that they have red hair."


[summary(Brief): P(XY) means "The probability that X is true, assuming Y is true."

P(yellowbanana) is "the chance that something is yellow, given that we know it is a banana" or "the chance that a banana is yellow".

Conversely, P(bananayellow) expresses "how much we should think a random object is a banana, after being told that it was yellow."]

[summary: P(XY) means "The probability that X is true, assuming Y is true."

To calculate a conditional probability P(XY), we consider only the cases where Y is true, and ask about the cases where X is also true.

Suppose a barrel contains 15 round green marbles, 5 round blue marbles, 70 square green marbles, and 10 square blue marbles. "The probability that a marble is blue, after we've been told that it's round" or "The probability a round marble is blue" is calculated by restricting our attention to only the 20 round marbles, and asking about the 5 marbles that are both blue and round.

Letting P(blueround) denote "the probability a marble is both blue and round":

P(blueround):=P(blueround)P(round)=5% blue and round marbles20% round marbles=520=0.25.

In general, P(XY):=P(XY)P(Y).]

[summary(Technical): P(XY):=P(XY)P(Y) is the answer to the question, "Assuming Y to be true, what is the probability of X?" or "Constraining our attention to only possibilities where Y is true, what is the probability of XY inside those cases?" (Where XY denotes "X and Y" or "Both X and Y are true".)

Thus, P(observationhypothesis) would denote the likelihood of seeing some observation, if a hypothesis is true. P(hypothesisobservation) would denote the revised probability we ought to assign to a hypothesis, after learning that the observation was true.]

The conditional probability P(XY) means "The probability of X given Y." That is, P(leftright) means "The probability that left is true, assuming that right is true."

P(yellowbanana) is the probability that a banana is yellow - if we know something to be a banana, what is the probability that it is yellow?

P(bananayellow) is the probability that a yellow thing is a banana - if the right side is known to be yellow, then we ask the question on the left, what is the probability that this is a banana?

Definition

To obtain the probability P(leftright), we constrain our attention to only cases where right is true, and ask about cases within right where left is also true.

Let XY denote "X and Y" or "X and Y are both true". Then:

P(leftright)=P(leftright)P(right).

We can see this as a kind of "zooming in" on only the cases where right is true, and asking, within this universe, for the cases where right and left are true.

Example 1

Suppose you have a bag containing objects that are either red or blue, and either square or round, where the number of each is given by the following table:

RedBlueSquare12Round34

If you reach in and feel a round object, the conditional probability that it is red is given in by zooming in on only the round objects, and asking about the frequency of objects that are round and red inside this zoomed-in view:

P(redround)=P(redround)P(round)=33+4=37

If you look at the object nearest the top, and can see that it's blue, but not see the shape, then the conditional probability that it's a square is:

P(squareblue)=P(squareblue)P(blue)=22+4=13

conditional probabilities bag

Example 2

Suppose you're Sherlock Holmes investigating a case in which a red hair was left at the scene of the crime.

The Scotland Yard detective says, "Aha! Then it's Miss Scarlet. She has red hair, so if she was the murderer she almost certainly would have left a red hair there. P(redhairScarlet)=99%, let's say, which is a near-certain conviction, so we're done."

"But no," replies Sherlock Holmes. "You see, but you do not correctly track the meaning of the conditional probabilities, detective. The knowledge we require for a conviction is not P(redhairScarlet), the chance that Miss Scarlet would leave a red hair, but rather P(Scarletredhair), the chance that this red hair was left by Scarlet. There are other people in this city who have red hair."

"So you're saying…" the detective said slowly, "that P(redhairScarlet) is actually much lower than 1?"

"No, detective. I am saying that just because P(redhairScarlet) is high does not imply that P(Scarletredhair) is high. It is the latter probability in which we are interested - the degree to which, knowing that a red hair was left at the scene, we infer that Miss Scarlet was the murderer. This is not the same quantity as the degree to which, assuming Miss Scarlet was the murderer, we would guess that she might leave a red hair."

"But surely," said the detective, "these two probabilities cannot be entirely unrelated?"

"Ah, well, for that, you must read up on Bayes' rule."

Example 3

"Even if most Dark Wizards are from Slytherin, very few Slytherins are Dark Wizards. There aren't all that many Dark Wizards, so not all Slytherins can be one."

"So yeh're saying, that most Dark Wizards are Slytherins… but…"

"But most Slytherins are not Dark Wizards."

— Harry Potter and the Methods of Rationality, Ch. 100