[summary:
We can represent a [joint_probability_distribution_on_event probability distribution] over two [event_probability events] and as a square:
We could also represent by [factoring_probability factoring], so using we'd make this picture:
]
Say we have two [event_probability events], and , and a [joint_probability_distribution_on_event probability distribution] over whether or not they happen. We can represent as a square:
So for example, the probability of both and occurring is the ratio of [the area of the dark red region] to [the area of the entire square]:
Visualizing probabilities in a square is neat because we can draw simple pictures that highlight interesting facts about our probability distribution.
Below are some pictures illustrating:
independent events (What happens if the columns and the rows in our square both line up?)
[marginal_probability marginal probabilities] (If we're looking at a square of probabilities, where's the probability of or the probability ?)
conditional probabilities (Can we find in the square the probability of if we condition on seeing ? What about the conditional probability ?)
[factoring_probability factoring a distribution] (Can we always write as a square? Why do the columns line up but not the rows?)
the process of computing joint probabilities from [factoring_probability factored probabilities]
Independent events
Here's a picture of the joint distribution of two independent events and :
Now the rows for and line up across the two columns. This is because . When and are independent, updating on or doesn't change the probability of .
For more on this visualization of independent events, see the aptly named Two independent events: Square visualization.
Marginal probabilities
We can see the [marginal_probability marginal probabilities] of and by looking at some of the blocks in our square. For example, to find the probability that doesn't occur, we just need to add up all the blocks where happens: .
Here's the probability of , and the probability of :
Here's the probability of :
In these pictures we're dividing by the area of the whole square. Since the probability of anything at all happening is 1, we could just leave it out, but it'll be helpful for comparison while we think about conditionals next.
Conditional probabilities
We can start with some probability , and then assume that is true to get a conditional probability of . Conditioning on being true is like restricting our whole attention to just the possible worlds where happens:
Then the conditional probability of given is the proportion of these worlds where also happens:
If instead we condition on , we get:
So our square visualization gives a nice way to see, at a glance, the conditional probabilities of given or given :
We don't get such nice pictures for :
Factoring a distribution
Recall the square showing our joint distribution :
Notice that in the above square, the reddish blocks for and are the same width and form a column; and likewise the blueish blocks for and . This is because we chose to [factoring_probability factor] our probability distribution starting with :
Let's use the [event_variable_equivalence equivalence] between [event_probability events] and [binary_variable binary random variables], so if we say we mean . For any choice of truth values and , we have
The first factor tells us how wide to make the red column relative to the blue column . Then the second factor tells us the proportions of dark and light within the column for .
We could just as well have factored by first:
Then we'd draw a picture like this:
By the way, earlier when we factored by first, we got simple pictures of the probabilities for conditioned on . Now that we're factoring by first, we have simple pictures for the conditional probability :
and for the conditional probability :
Computing joint probabilities from factored probabilities
Let's say we know the factored probabilities for and , factoring by . That is, we know , and we also know and . How can we recover the joint probability that is the case and also is the case?
Since
we can multiply the prior by the conditional to get the joint :
If we do this at the same time for all the possible truth values and , we get back the full joint distribution:
[todo: information theory. a couple things, then point to another page. eg show example when two things have lots of mutual info.]
Comments
Eric Rogstad
Does this actually work for any proportions of A and B? Is there a simple proof?
Eric Rogstad
And is there any significance to the fact that A and -A are divided by a straight line, but B and -B are divided by a jagged line? Could we have arranged the rectangle so that B and -B were divided by a straight line w/o changing any of the probabilities?
Tsvi BT
Yes, but I'm not sure it's worth proving? I'd say that the "Factoring" section explains how this works, though there are no proofs. Will add pointers at the beginning.
This is addressed in the factoring section.