Typically, it is impractical or impossible to record data on each subject in a population (a census). As such, we must draw conclusions on the population from data on a subset thereof.

The population can be viewed as a probability distribution with mean and variance .

Sample Statistics

We can calculate the Measures of Central Tendency and Measures of Spread for individual samples, and use these to infer information about the whole population. The sample mean () and sample variance () are calculated from the sample as usual.

Relationships Between Samples and their Populations

The sample mean is an unbiased estimator of the population mean:

The sample variance is not an unbiased estimator of the population variance:

This can be rearranged to find an unbiased estimator of the population variance:

As such, the sample variance is of little use to us, so we change the notation:

  • is the unbiased estimator.
  • is the biased estimator.

Large Samples

The law of large numbers states that the empirical mean converges to the population mean as the sample size increases:

This law also gives us that, for all ,

Combining Random Variables

For any and ,

  • Mean of is .
  • Variance of is .
  • Standard deviation of is . The first fact always holds, while the latter two require and to be independent.

Central Limit Theorem

The central limit theorem states that the distribution of sample means is well approximated by a normal distribution for large sample sizes.

This means that for samples of size drawn randomly from a parent distribution with mean and standard deviation , the distribution of sample means is a normal distribution with mean and standard deviation , where is large.