[summary(Brief): $\mathbb{P}(X\mid Y)$ means "The probability that X is true, assuming Y is true."

$\mathbb{P}(yellow\mid banana)$ is "the chance that something is yellow, given that we know it is a banana" or "the chance that a banana is yellow".

Conversely, $\mathbb{P}(banana\mid yellow)$ expresses "how much we should think a random object is a banana, after being told that it was yellow."]

[summary: $\mathbb{P}(X\mid Y)$ means "The probability that X is true, assuming Y is true."

$\mathbb{P}(yellow\mid banana)$ is "the chance that something is yellow, given that we know it is a banana" or equivalently "the chance that a banana is yellow".
$\mathbb{P}(banana\mid yellow)$ expresses "how much we should think a random object is a banana, after being told that it was yellow" or "the chance that a yellow thing is a banana".

To calculate a conditional probability $\mathbb{P}(X\mid Y)$ , we consider only the cases where Y is true, and ask about the cases where X is also true.

Suppose a barrel contains 15 round green marbles, 5 round blue marbles, 70 square green marbles, and 10 square blue marbles. "The probability that a marble is blue, after we've been told that it's round" or "The probability a round marble is blue" is calculated by restricting our attention to only the 20 round marbles, and asking about the 5 marbles that are both blue and round.

Letting $\mathbb{P}(blue \wedge round)$ denote "the probability a marble is both blue and round":

$\mathbb{P}(blue\mid round) := \frac{\mathbb{P}(blue \wedge round)}{\mathbb{P}(round)} = \frac{\text{5% blue and round marbles}}{\text{20% round marbles}} = \frac{5}{20} = 0.25.$

In general, $\mathbb{P}(X\mid Y) := \frac{\mathbb{P}(X \wedge Y)}{\mathbb{P}(Y)}.$ ]

[summary(Technical): $\mathbb{P}(X\mid Y) := \frac{\mathbb{P}(X \wedge Y)}{\mathbb{P}(Y)}$ is the answer to the question, "Assuming $Y$ to be true, what is the probability of $X$ ?" or "Constraining our attention to only possibilities where $Y$ is true, what is the probability of $X \wedge Y$ inside those cases?" (Where $X \wedge Y$ denotes "X and Y" or "Both X and Y are true".)

Thus, $\mathbb P(observation\mid hypothesis)$ would denote the likelihood of seeing some observation, if a hypothesis is true. $\mathbb P(hypothesis\mid observation)$ would denote the revised probability we ought to assign to a hypothesis, after learning that the observation was true.]

The conditional probability $\mathbb{P}(X\mid Y)$ means "The probability of $X$ given $Y$ ." That is, $\mathbb P(left\mid right)$ means "The probability that $left$ is true, assuming that $right$ is true."

$\mathbb P(yellow\mid banana)$ is the probability that a banana is yellow - if we know something to be a banana, what is the probability that it is yellow?

$\mathbb P(banana\mid yellow)$ is the probability that a yellow thing is a banana - if the right side is known to be $yellow$ , then we ask the question on the left, what is the probability that this is a $banana$ ?

Definition

To obtain the probability $\mathbb P(left \mid right),$ we constrain our attention to only cases where $right$ is true, and ask about cases within $right$ where $left$ is also true.

Let $X \wedge Y$ denote " $X$ and $Y$ " or " $X$ and $Y$ are both true". Then:

$\mathbb P(left \mid right) = \dfrac{\mathbb P(left \wedge right)}{\mathbb P(right)}.$

We can see this as a kind of "zooming in" on only the cases where $right$ is true, and asking, within this universe, for the cases where $right$ and $left$ are true.

Example 1

Suppose you have a bag containing objects that are either red or blue, and either square or round, where the number of each is given by the following table:

$\begin{array}{l\mid r\mid r} & Red & Blue \\ \hline Square & 1 & 2 \\ \hline Round & 3 & 4 \end{array}$

If you reach in and feel a round object, the conditional probability that it is red is given in by zooming in on only the round objects, and asking about the frequency of objects that are round and red inside this zoomed-in view:

$\mathbb P(red\mid round) = \dfrac{\mathbb P(red \wedge round)}{\mathbb P(round)} = \dfrac{3}{3 + 4} = \dfrac{3}{7}$

If you look at the object nearest the top, and can see that it's blue, but not see the shape, then the conditional probability that it's a square is:

$\mathbb P(square\mid blue) = \dfrac{\mathbb P(square \wedge blue)}{\mathbb P(blue)} = \dfrac{2}{2 + 4} = \dfrac{1}{3}$

conditional probabilities bag

Example 2

Suppose you're Sherlock Holmes investigating a case in which a red hair was left at the scene of the crime.

The Scotland Yard detective says, "Aha! Then it's Miss Scarlet. She has red hair, so if she was the murderer she almost certainly would have left a red hair there. $\mathbb P(red hair\mid Scarlet) = 99\%,$ let's say, which is a near-certain conviction, so we're done."

"But no," replies Sherlock Holmes. "You see, but you do not correctly track the meaning of the conditional probabilities, detective. The knowledge we require for a conviction is not $\mathbb P(redhair\mid Scarlet),$ the chance that Miss Scarlet would leave a red hair, but rather $\mathbb P(Scarlet\mid redhair),$ the chance that this red hair was left by Scarlet. There are other people in this city who have red hair."

"So you're saying…" the detective said slowly, "that $\mathbb P(redhair\mid Scarlet)$ is actually much lower than $1$ ?"

"No, detective. I am saying that just because $\mathbb P(redhair\mid Scarlet)$ is high does not imply that $\mathbb P(Scarlet\mid redhair)$ is high. It is the latter probability in which we are interested - the degree to which, knowing that a red hair was left at the scene, we infer that Miss Scarlet was the murderer. This is not the same quantity as the degree to which, assuming Miss Scarlet was the murderer, we would guess that she might leave a red hair."

"But surely," said the detective, "these two probabilities cannot be entirely unrelated?"

"Ah, well, for that, you must read up on Bayes' rule."

Example 3

"Even if most Dark Wizards are from Slytherin, very few Slytherins are Dark Wizards. There aren't all that many Dark Wizards, so not all Slytherins can be one."

"So yeh're saying, that most Dark Wizards are Slytherins… but…"

"But most Slytherins are not Dark Wizards."

— Harry Potter and the Methods of Rationality, Ch. 100