Variance


The variance σ2\sigma ^{2} of a Random variable XX is a measure of its statistical dispersion. It is defined as the Expected value of the square of the deviation from the population mean E[X]=μ\mathrm{E}[X]=\mu:

var(X)σX2=E[(Xμ)2]\text{var}(X)\equiv\sigma ^{2}_{X}=\mathrm{E}[(X-\mu)^{2}]

The variance can also be expressed as

var(X)=E[(XE[X])2]=E[X22XE[X]+E[X]2]=E[X2]2E[X]E[X]+E[X]2=E[X2]E[X]2=E[X2]μ2\begin{align} \text{var}(X)&=\mathrm{E}[(X-\mathrm{E}[X])^{2}] \\ &=\mathrm{E}[X^{2}-2X\mathrm{E}[X]+\mathrm{E}[X]^{2}] \\ &=\mathrm{E}[X^{2}]-2\mathrm{E}[X]\mathrm{E}[X]+\mathrm{E}[X]^{2} \\ &=\mathrm{E}[X^{2}]-\mathrm{E}[X]^{2} \\ &=\mathrm{E}[X^{2}]-\mu ^{2} \end{align}

Or as the Covariance of a variable with itself

var(X)=cov(X,X)\text{var}(X)=\text{cov}(X,X)

The primary draw of variance as a measure of dispersion is that it is mathematically convenient to use in calculations and to derive results with. For instance, Chebyshev's inequality forces constraints onto what values the variable can take depending on its variance. Furthermore, it is the second central function moment of a probability distribution.

The square root of variance is the standard deviation: σ2=σ\sqrt{ \sigma ^{2} }=\sigma. The variance is (approximately) related to the absolute error Δ\Delta by σ2=Δ2/3\sigma ^{2}=\Delta ^{2}/3.

Definitions

Discrete random variable

Given a discrete random variable XX with Probability mass function pX(x)p_{X}(x) of expected value μ\mu, the variance is

var(X)=i=1npX(xi)(xiμ)2\text{var}(X)=\sum_{i=1}^{n} p_{X}(x_{i})(x_{i}-\mu)^{2}

Continuous random variable

Given a continuous random variable XX with Probability density function fX(x)f_{X}(x) of expected value μ\mu, the variance is

var(X)=+fX(x)(xμ)2 dx\text{var}(X)=\int_{-\infty}^{+\infty}f_{X}(x)(x-\mu)^{2}\ dx

or equivalently

var(X)=+x2fX(x) dxμ2\text{var}(X)=\int_{-\infty}^{+\infty}x^{2}f_{X}(x)\ dx-\mu ^{2}

Multiple variables

The variance is not defined for multiple variables (see instead Covariance). However, it is possible to find the variance of one variable among many from the joint distribution function.

For NN continuous random variables X1,,XNX_{1},\ldots,X_{N} with JDF f(x1,,xN)f(x_{1},\ldots,x_{N}), the variance of XiX_{i} is

var(Xi)=E[(Xiμi)2]=ΩNΩ1(xiμi)2f(x1,,xN)dx1dxN\text{var}(X_{i})= \mathrm{E}[(X_{i}-\mu_{i})^{2}]=\int_{\Omega_{N}}\ldots \int_{\Omega_{1}}(x_{i}-\mu_{i})^{2}f(x_{1},\ldots,x_{N})dx_{1}\ldots dx_{N}

where μi=E[Xi]\mu_{i}=\mathrm{E}[X_{i}] is the expected value of XiX_{i}.

Sample variance

By definition, calculating the true variance requires knowing the true mean μ\mu. This is often not the case, so the true mean must be substituted by (one of) its estimator: the sample mean. For a random sample of independent variables X1,,XNX_{1},\ldots,X_{N}, we can look for the average of the square deviations from the sample mean Xˉ\bar{X}:

Sbiased2=1Ni=1N(XiXˉ)2S^{2}_\text{biased}=\frac{1}{N}\sum_{i=1}^{N} (X_{i}-\bar{X})^{2}

This is known as the sample variance and is an estimator of the true variance. However, in this form, it is imperfect. It is a biased estimator of the true variance σ2\sigma ^{2}. To fix the bias, we apply Bessel's correction (changing NN to N1N-1):

S2=1N1i=1N(XiXˉ)2S^{2}=\frac{1}{N-1}\sum_{i=1}^{N} (X_{i}-\bar{X})^{2}

This is the unbiased sample variance and is the appropriate estimator for the true variance in most cases.1 Its expected value is, in fact, E[S2]=σ2\mathrm{E}[S^{2}]=\sigma ^{2}.

Properties

  • If X1,,XNX_{1},\ldots,X_{N} are independent variables, the variance of the sum is the sum of the variances: var(i=1NXi)=i=1Nvar(Xi)\text{var}\left( \sum_{i=1}^{N}X_{i} \right)=\sum_{i=1}^{N}\text{var}(X_{i}). Also, it is linear with respect to constant scaling: var(ai=1NXi)=a2var(i=1NXi)\text{var}\left( a\sum_{i=1}^{N}X_{i} \right)=a^{2}\text{var}\left( \sum_{i=1}^{N}X_{i} \right).

Propagation of variance

It is common for a quantity ww to be dependent on other quantities x,y,x,y,\ldots according to some function w(x,y,)w(x,y,\ldots). If we take the variables x,y,x,y,\ldots to be independent of each other, the variance of ww is expressed by the law of propagation of variance:

var(w)=xi=x,y,(wxi)2var(xi)=(wx)2var(x)+(wy)2var(y)+\boxed{\begin{align} \text{var}(w)&=\sum_{x_{i}=x,y,\ldots}\left( \frac{ \partial w }{ \partial x_{i} } \right)^{2}\text{var}(x_{i}) \\ &=\left( \frac{ \partial w }{ \partial x } \right)^{2}\text{var}(x)+\left( \frac{ \partial w }{ \partial y } \right)^{2}\text{var}(y)+\ldots \end{align}}

For this to work, ww must be twice-differentiable in all of its arguments. This is a special case (and an approximation nonetheless) of the general properties of functions of random variables.

Footnotes

  1. But not all. Bias is far from the only problem to consider in choosing an estimator. Sbiased2S^{2}_\text{biased} actually has legitimate use cases, as it has lower variance than unbiased S2S^{2}, at the cost of missing the mark due to the bias. In fact, if using S2S^{2} over Sbiased2S^{2}_\text{biased} causes estimates to go all over the place due to the higher variance, the little gain from removing bias is probably not enough to offset that, so you end up with worse estimates despite using the unbiased estimator. This usually happens when the sample size NN is small, when adding a bit of bias is often an acceptable price to pay to reduce variance.