2. Bayes’ Theorem and Marginalisation

Bayes' theorem is one of the central rules of probabilistic reasoning. It allows us to update what we believe about an unknown event after observing new data.

In this course, Bayes' theorem is introduced as a practical tool for statistical inference. It helps us move from one question to another:

Forward question: if an assumption is true, how likely is this observation?
Inverse question: after seeing this observation, how plausible is the assumption?

This distinction matters because P(A|B) is generally not the same as P(B|A). For example, the probability that a detector gives an alert if an event has occurred is not the same as the probability that the event occurred because the detector gave an alert.

Bayes' theorem

The basic relationship can be written as:

P(A and B) = P(A|B)P(B) = P(B|A)P(A)

Rearranging this gives the more familiar form of Bayes' theorem:

P(A|B) = P(B|A)P(A) / P(B)

In this expression:

P(A) is the prior or base-rate probability of event A.
P(B|A) is the probability of observing B if A is true.
P(B) is the overall probability of observing B.
P(A|B) is the updated probability of A after observing B.

In Bayesian inference, this updated probability is called the posterior probability. It combines prior knowledge with the information contained in the observed data.

Independence

Two events are statistically independent if knowing that one occurred does not change the probability of the other. In that case:

P(A|B) = P(A)

and the joint probability simplifies to:

P(A and B) = P(A) × P(B)

This is a special case. In most real inference problems, the events are not independent, and the conditional probabilities carry important information.

The marginalisation rule

To use Bayes' theorem, we often need to calculate the overall probability of the observed data. This is where the marginalisation rule is useful.

The idea is to add together all the different ways in which an event could occur. If event B can occur when A is true or when A is false, then:

P(B) = P(B|A)P(A) + P(B|not A)P(not A)

This is also known as the law of total probability. The important point is that each conditional probability must be weighted by how likely that case is. This is why rare events can lead to counter-intuitive results: even a good test may produce misleading conclusions if the event being tested for is very uncommon.

In the next example, this rule is used together with Bayes' theorem to solve the vampire problem.

Back to overview

Next chapter

Bayes’ theorem and marginalisation

2. Bayes’ Theorem and Marginalisation

Bayes' theorem

Independence

The marginalisation rule

NAFEMS Membership