Interpretations of Probabilities

Objective Probability

Objective Probabilities apply to repeatable events, such as dice rolls or coin flips. These can be modelled as happening many times.

Subjective Probability

Subjective probabilities apply to unrepeatable events, such as the winner of a specific competition. In this interpretation, all events are treated probabilistically. While it may seem counter-intuitive, or indeed inaccurate to model specific variables probabilistically, it generally is a very accurate method.

Bruno de Finetti’s Theorem

This theorem state that, given a person who

Has a set of beliefs which violate the axioms of probability, and
Is willing to accept any fair bet on these beliefs,

then there exists a set of bets which he will accept, in which he is guaranteed (ie. with probability of $1$ ) to lose money. This means that irrational beliefs can be exploited.

Notation

For a random variable $X$ over $Ω$ , $P (X = x, x \in Ω)$ can be written shorthand as $P (x)$ .

The probability of ” $x$ and $y$ ”, often written as $P (x \cap y)$ , is written here as $P (x, y)$ .

If $Ω$ is finite or countably infinite, then $P (X = x, x \in Ω)$ is a probability mass function (PMF). If $Ω$ is infinite and continuous, then a probability distribution function (PDF) exists instead. In the case of a PDF, $P (x)$ is not the probability that $X = x$ , as this would be $0$ for all $x \in Ω$ . Instead,

P (x \in [a, b]) = \int_{a}^{b} P (x) d x

Bayes’ Rule

The definition of conditional probability is:

P (x ∣ y) = \frac{P ( x , y )}{P ( y )}

This can be re-arranged to give

P (x ∣ y) P (y) = P (x, y) .

From this, Bayes’ rule follows:

P (A ∣ B) = \frac{P ( B ∣ A ) P ( A )}{P ( B )} .

Marginalisation

From a joint distribution, one of the two variables can be integrated out, or “marginalised”.

Discrete

P (x) = y \in Ω \sum P (x, y)

Continuous

P (x) = \int_{y \in Ω} P (x, y) d y

Independence

If $x$ and $y$ are independent, then $P (x, y) = P (x) \cdot P (y)$ .

Conditional Independence

Sometimes, two variables may be independent only when conditioned on a third variable. For example, given two variables $x$ and $y$ which are independent when conditioned on a third variable $z$ ,

P (x, y ∣ z) = P (x ∣ z) \cdot P (y ∣ z) .

However, this does not imply that

P (x, y) = P (x) \cdot P (y) .

Terminology

Posterior : $P (Hy p o t h es i s ∣ O b ser v a t i o n)$ - This is usually the goal.
Likelihood : $P (O b ser v a t i o n ∣ Hy p o t h es i s)$
Prior : $P (Hy p o t h es i s)$
The Normalisation (or evidence) : $P (O b ser v a t i o n)$ - This is often difficult to calculate.

Put more simply, the prior is the belief before the analysis of evidence, and the posterior is the belief after. Bayes’ rule is used to update the belief as a result of new evidence, and in the context of the prior.

Computing the Evidence

The evidence, $P (O b ser v a t i o n)$ , can be calculated as a marginalisation over all possible hypotheses. For a finite number of hypotheses, given by $H$ ,

P (O b ser v a t i o n) = h \in H \sum P (O b ser v a t i o n ∣ h) P (h) .

This case is often intractable.

The Prosecutor’s Fallacy

Confusing the likelihood with the posterior, and ignoring the prior, is known as The Prosecutor’s Fallacy, and is a common mistake. It can be easily seen in the following example:

Given a population of 1 million people, all of whom have DNA recorded in a database,
A crime is committed, and is known to have been committed by one of the members of the population.
DNA evidence is recovered, and identified as a specific individual from a process which produces false negatives and false positives with a frequency of 1 in a million.
Properly analysing the data reveals that the probability that the DNA does belong to the identified person is $\frac{1}{2}$ .

Statistics as Parameter Estimation

If we interpret the possible parameters for a given probability distribution (eg. $σ$ and $μ$ for the normal distribution, or $n$ and $p$ for the binomial distribution) as hypotheses, then we have a continuous hypothesis space.

This leads to statistical questions of the form “Given the following data, what are the parameters of the distribution from which it is sampled?“.

For example, to determine whether a coin is fair, we have the posterior $P (θ ∣ O b ser v a t i o n)$ for the “success parameter” $θ \in [0, 1]$ and the observed data from $N$ trials. To compute this, we have

P (θ ∣ n, N) = \frac{P (( n , N ) ∣ θ ) P ( θ )}{P (( n , N ))}

Where $(n, N)$ is the number of successes $n$ in the number of trials $N$ . Computing $P ((n, N) ∣ θ)$ as

i = 1 \prod N P (x_{i} ∣ θ)

requires an assumption that all the trials are independent.

As $P (x_{i} ∣ θ)$ is $θ$ when $x_{i}$ is a success, and $(1 - θ)$ otherwise, This formula can be expanded and simplified to give that

P ((n, N) ∣ θ, I) = θ^{n} (1 - θ)^{N - n} .

CS Notes

Explorer

Bayesian Probability

Interpretations of Probabilities

Objective Probability

Subjective Probability

Bruno de Finetti’s Theorem

Notation

Bayes’ Rule

Marginalisation

Discrete

Continuous

Independence

Conditional Independence

Terminology

Computing the Evidence

The Prosecutor’s Fallacy

Statistics as Parameter Estimation

Graph View

Table of Contents

Backlinks

CS Notes

Explorer

Bayesian Probability

Interpretations of Probabilities §

Objective Probability §

Subjective Probability §

Bruno de Finetti’s Theorem §

Notation §

Bayes’ Rule §

Marginalisation §

Discrete §

Continuous §

Independence §

Conditional Independence §

Terminology §

Computing the Evidence §

The Prosecutor’s Fallacy §

Statistics as Parameter Estimation §

Graph View

Table of Contents

Backlinks

Interpretations of Probabilities

Objective Probability

Subjective Probability

Bruno de Finetti’s Theorem

Notation

Bayes’ Rule

Marginalisation

Discrete

Continuous

Independence

Conditional Independence

Terminology

Computing the Evidence

The Prosecutor’s Fallacy

Statistics as Parameter Estimation