Interpretations of Probabilities
Objective Probability
Objective Probabilities apply to repeatable events, such as dice rolls or coin flips. These can be modelled as happening many times.
Subjective Probability
Subjective probabilities apply to unrepeatable events, such as the winner of a specific competition. In this interpretation, all events are treated probabilistically. While it may seem counter-intuitive, or indeed inaccurate to model specific variables probabilistically, it generally is a very accurate method.
Bruno de Finetti’s Theorem
This theorem state that, given a person who
- Has a set of beliefs which violate the axioms of probability, and
- Is willing to accept any fair bet on these beliefs,
then there exists a set of bets which he will accept, in which he is guaranteed (ie. with probability of ) to lose money. This means that irrational beliefs can be exploited.
Notation
For a random variable over , can be written shorthand as .
The probability of ” and ”, often written as , is written here as .
If is finite or countably infinite, then is a probability mass function (PMF). If is infinite and continuous, then a probability distribution function (PDF) exists instead. In the case of a PDF, is not the probability that , as this would be for all . Instead,
Bayes’ Rule
The definition of conditional probability is:
This can be re-arranged to give
From this, Bayes’ rule follows:
Marginalisation
From a joint distribution, one of the two variables can be integrated out, or “marginalised”.
Discrete
Continuous
Independence
If and are independent, then .
Conditional Independence
Sometimes, two variables may be independent only when conditioned on a third variable. For example, given two variables and which are independent when conditioned on a third variable ,
However, this does not imply that
Terminology
- Posterior : - This is usually the goal.
- Likelihood :
- Prior :
- The Normalisation (or evidence) : - This is often difficult to calculate.
Put more simply, the prior is the belief before the analysis of evidence, and the posterior is the belief after. Bayes’ rule is used to update the belief as a result of new evidence, and in the context of the prior.
Computing the Evidence
The evidence, , can be calculated as a marginalisation over all possible hypotheses. For a finite number of hypotheses, given by ,
This case is often intractable.
The Prosecutor’s Fallacy
Confusing the likelihood with the posterior, and ignoring the prior, is known as The Prosecutor’s Fallacy, and is a common mistake. It can be easily seen in the following example:
- Given a population of 1 million people, all of whom have DNA recorded in a database,
- A crime is committed, and is known to have been committed by one of the members of the population.
- DNA evidence is recovered, and identified as a specific individual from a process which produces false negatives and false positives with a frequency of 1 in a million.
- Properly analysing the data reveals that the probability that the DNA does belong to the identified person is .
Statistics as Parameter Estimation
If we interpret the possible parameters for a given probability distribution (eg. and for the normal distribution, or and for the binomial distribution) as hypotheses, then we have a continuous hypothesis space.
This leads to statistical questions of the form “Given the following data, what are the parameters of the distribution from which it is sampled?“.
For example, to determine whether a coin is fair, we have the posterior for the “success parameter” and the observed data from trials. To compute this, we have
Where is the number of successes in the number of trials . Computing as
requires an assumption that all the trials are independent.
As is when is a success, and otherwise, This formula can be expanded and simplified to give that