Arithmetic mean


The arithmetic mean is a function of a set of numbers that gives the sum of all the numbers divided the count of numbers. Calling x1,,xNx_{1},\ldots,x_{N} the NN numbers, the arithmetic mean is

xˉ1Ni=1Nxi=x1++xNN\bar{x}\equiv\frac{1}{N} \sum_{i=1}^{N} x_{i}=\frac{x_{1}+\ldots+x_{N}}{N}

It is a measure of central tendency.

In statistics

The arithmetic mean commonly appears in statistics as a descriptive statistic of a sample, both in the sense random variables and data points. Despite its simplicity, it benefits from remarkable properties that make it a commonly employed statistic.

Consider a sample of NN independent random variables X1,,XNX_{1},\ldots,X_{N}. The sample mean is

Xˉ=1Ni=1NXi\boxed{\bar{X}=\frac{1}{N}\sum_{i=1}^{N} X_{i}}

This quantity is a function of random variables and hence follows its own probability distribution with its own true mean μ\mu and true variance σ2\sigma ^{2}. Estimating these quantities through the sample mean is rather straightforward. The expected value is simple:

E[Xˉ]=1Ni=1NE[Xi]=1Ni=1Nμi=μ\mathrm{E}[\bar{X}]=\frac{1}{N}\sum_{i=1}^{N} \mathrm{E}[X_{i}]=\frac{1}{N}\sum_{i=1}^{N} \mu_{i}=\mu

In other words, the sample (arithmetic) mean is an unbiased estimator of the true mean. The variance of this estimator is:

var(Xˉ)=var(1Ni=1NXi)=1N2var(i=1NXi)=1N2i=1Nvar(Xi)=1N2i=1Nσi2\text{var}(\bar{X})=\text{var}\left( \frac{1}{N}\sum_{i=1}^{N} X_{i} \right)=\frac{1}{N^{2}}\text{var}\left( \sum_{i=1}^{N} X_{i} \right)=\frac{1}{N^{2}}\sum_{i=1}^{N} \text{var}(X_{i})=\frac{1}{N^{2}}\sum_{i=1}^{N} \sigma_{i}^{2}

by using properties of the variance for independent variables. If the X1,,XNX_{1},\ldots,X_{N} are iid, then σi2=σ2\sigma_{i}^{2}=\sigma ^{2} and i=1Nσ2=Nσ2\sum_{i=1}^{N}\sigma ^{2}=N\sigma ^{2} so

var(Xˉ)=σ2Nif X1,,XN are iid\boxed{\text{var}(\bar{X})=\frac{\sigma ^{2}}{N}\quad\text{if }X_{1},\ldots,X_{N}\text{ are iid}}

If they are not, we can define the average population variance

σˉ2=1Ni=1Nσi2\bar{\sigma}^{2}=\frac{1}{N}\sum_{i=1}^{N} \sigma ^{2}_{i}

for which

var(Xˉ)=σˉ2Nif X1,,XN are independent\boxed{\text{var}(\bar{X})=\frac{\bar{\sigma}^{2}}{N}\quad\text{if }X_{1},\ldots,X_{N}\text{ are independent}}

This form goes back to the previous one for iid variables. These variances tell us how precise the estimate is.

These formulas are true regardless of the probability distributions of X1,,XNX_{1},\ldots,X_{N} and they become particularly useful after remembering that the arithmetic mean obeys the law of large numbers:

limNP(Xˉμε)=0\lim_{ N \to \infty } P(\lvert \bar{X}-\mu \rvert \geq \varepsilon)=0

In other words, as the number of random variables (in practice, data points) grows large, the sample mean becomes a progressively better (more precise) estimator of the true mean.1 Moreover, when rescaled to N(Xˉμ)\sqrt{ N }(\bar{X}-\mu), then sample mean's distribution approaches a Gaussian distribution N(0,σ2)\mathcal{N}(0,\sigma ^{2}), as proven by the central limit theorem.

Footnotes

  1. You can also see this due to how var(Xˉ)0\text{var}(\bar{X})\to 0 when NN\to \infty.