Expected value


The expected value or expectation E[X]\text{E}[X] of a Random variable XX is a generalization of a weighted average over all possible values the variable can take. It is what the word "mean" typically refers to in the context of statistics, though there's many other possible meanings, such as the arithmetic mean. It is the first raw moment of the Probability distribution. The name "expectation" refers to both the value E[X]\text{E}[X] itself and the operator E\text{E} that is applied onto XX. It is also commonly denoted with the letter μ\mu.

The name can be misleading: the expected value is not the most likely value (that would be the mode). It is strictly theoretical and may not even be an allowed value of the random variable: for instance, the expected value of a fair six-sided die is 3.5, which is not even on the die.

The definition differs between discrete and continuous variables, and also between countable and uncountable outcomes.

Discrete variable with finite outcomes

Given a discrete random variable XX with a finite set of possible outcomes {x1,,xN}\{ x_{1},\ldots,x_{N} \}, with a Probability mass function pX(x)p_{X}(x), the expected value is

E[X]=x1pX(x1)++xnpX(xn)=i=1NxipX(xi)\text{E}[X]=x_{1}p_{X}(x_{1})+\ldots+x_{n}p_{X}(x_{n})=\sum_{i=1}^{N} x_{i}p_{X}(x_{i})

which is just the average weighted by the probability of an outcome.

Discrete variable with countably infinite outcomes

In the same conditions as above, but with a countably infinite set of outcomes {xi}i\{ x_{i} \}_{i}, the expected value can be easily defined by extending the sum as an infinite series:

E[X]=i=1xipX(xi)\text{E}[X]=\sum_{i=1}^{\infty} x_{i}p_{X}(x_{i})

The Riemann series theorem states that the convergence value of some series with both positive and negative terms depends on the order in which the terms are given in. Since random variables are just that, random, it isn't possible to determine what order the terms are given in. Thus, this definition only holds if the series converges absolutely, in which case the order is not important. If the series is not absolutely convergent, this definition may not hold. If it doesn't (i.e. the series diverges), it is said that the variable does not have finite expectation.

Continuous variable

Given a continuous random variable XX with a Probability density function fX(x)f_{X}(x), the expected value is

E[X]=xfX(x) dx\text{E}[X]=\int_{-\infty}^{\infty} xf_{X}(x) \ dx

Similarly to the series above, integrals may diverge, in which case the variable does not have finite expectation.

Multiple variables

The expected value is not defined for multiple variables. However, it is possible to find the expected value of one variable among many from the joint distribution function.

For NN continuous random variables X1,,XNX_{1},\ldots,X_{N} with JDF f(x1,,xN)f(x_{1},\ldots,x_{N}), the expected value of XiX_{i} is

E[Xi]=ΩNΩ1xif(x1,,xN)dx1dxN\text{E}[X_{i}]=\int_{\Omega_{N}}\ldots \int_{\Omega_{1}}x_{i}f(x_{1},\ldots,x_{N})dx_{1}\ldots dx_{N}

Properties

The expectation has some useful properties:

  • If X>0X>0 then E[X]>0\text{E}[X]>0.
  • It is linear: E[aX+bY]=aE[X]+bE[Y]\text{E}[aX+bY]=a\text{E}[X]+b\text{E}[Y], where aa and bb are constants. This follows from the linearity of a series or integral.
  • It is monotonous: If XYX\leq Y, then E[X]E[Y]\text{E}[X]\leq \text{E}[Y].
  • If X=YX=Y, then E[X]=E[Y]\text{E}[X]=\text{E}[Y].
  • If E[X]=0\text{E}[\lvert X \rvert]=0 then X=0X=0.
  • If X=cX=c for a constant cc, then E[X]=c\text{E}[X]=c. As a consequence, since the expectation is a constant, the expectation operator is idempotent: E[E[X]]=E[X]\text{E}[\text{E}[X]]=\text{E}[X].
  • E[XY]E[X]E[Y]\text{E}[XY]\neq \text{E}[X]\text{E}[Y] in general. It is guaranteed to be equal if XX and YY are independent variables, but could theoretically be true even if they are dependent.

Expected value arrays

When dealing with a random vector X=(X1,,XN)\mathbf{X}=(X_{1},\ldots,X_{N}), the expected value vector or mean vector is the vector of expected values:

E[X]=(E[X1],,E[XN])\text{E}[\mathbf{X}]=(\text{E}[X_{1}],\ldots,\text{E}[X_{N}])

In this case, the linearity properties looks like

E[AX+b]=AE[X]+b\text{E}[\mathrm{A}\mathbf{X}+\mathbf{b}]=\mathrm{A}\text{E}[\mathbf{X}]+\mathbf{b}

where A\mathrm{A} is an N×NN\times N matrix and b\mathbf{b} is an NN-dimensional vector. Similar nomenclature applies to the expected value matrix.