Random variable


A random variable (RV) is a quantity whose value cannot be known before measurement due to a dependency on random events. What can be known is the Probability distribution that the variable follows, which gives us no information on what the value we're investigating is, but does give us information on what the value is likely to be.

Generally speaking, it is assumed that a random variable is associated with a "true value", that is, the actual, errorless value of the quantity that an object or phenomenon possesses and that we then take measurements of. Our measurements are realizations of the random variable associated with the true value, with an error being introduced by our measurement process. For instance, a plank of wood has a true length, and our measurements try to come as close as possible to that true value.

It is also possible that the true value does not exist and that the quantity is fundamentally statistical. This may be something as simple as a die roll, which is exclusively random and has no true outcome, or as complicated as an quantum wave functions.

More formally, a random variable is a function X:ΩEX:\Omega\to E whose domain (called the sample space Ω\Omega) may be either discrete or continuous and includes all possible values that the variable can take, and its image is a measurable set EE. For instance, a coin flip may be described as the random variable Coin:{Heads,Tails}{1,1}\text{Coin}:\{\text{Heads},\text{Tails}\}\to\{1,-1\}. The benefit of this definition is that it allows binding non-mathematical concepts (like "heads", "tails", "win" or "loss") to mathematical ones (generally numbers).

In many cases, the sample space is naturally numeric. In these case, there is no need to "convert" the outcome's value and the random variabile is simply the identity function, so that ΩE\Omega\equiv E. For instance, the sample space of a six-sided die roll is {1,2,3,4,5,6}\{ 1,2,3,4,5,6 \}, which we keep as-is. Even when Ω\Omega isn't numeric, it's common (but not obligatory) for the random variable to be bijective, so that each distinct outcome has a distinct mathematical representation. In the coin flip above, "Heads" maps to 11 and "Tails" maps to 1-1. This allows the two representations to be equivalent, which is convenient for clarity: you can say "heads" or "tails" to mean 11 or 1-1 because they map uniquely. For these reasons, the distinction between Ω\Omega and EE is often left unspecified and the random variable is referred to using its outcomes (i.e., the elements of Ω\Omega).

Random variables are typically written with a capital letter like XX, whereas specific values that the variable takes (sometimes called realizations) a written in a lowercase letter like xx.

A vector whose components are RVs is called a random vector. A matrix whose entries are RVs is called a random matrix.

Discrete random variables

A discrete random variable XX in defined over a discrete sample space Ω={x1,,xN}\Omega=\{ x_{1},\ldots,x_{N} \} where NN may be either finite or infinite. The Probability that XX assumes the specific value xix_{i} is

Pi=P(X=xi)for i{1,,N}P_{i}=P(X=x_{i})\quad\text{for }i\in \{ 1,\ldots,N \}

The set of probabilities {P1,,PN}\{ P_{1},\ldots,P_{N} \} associated with XX are mapped to their respective outcomes by the probability mass function pX:{P1,,PN}Ωp_{X}:\{ P_{1},\ldots,P_{N} \}\to \Omega. Common discrete probability distributions are the Binomial distribution and the Poisson distribution.

Continuous random variables

A continuous random variable XX is defined over a continuous sample space Ω=[a,b]\Omega=[a,b]. The edges may or may not be included and the space may also be infinite (i.e. a=a=-\infty and/or b=+b=+\infty). Unlike with discrete RVs, it is nonsensical to define the probability for a specific event occurring: since there are infinite possible events in a continuous distribution, the probability of any finite number of them is exactly zero. If we call NN the total number of possibile outcomes, the probability of seeing a specific outcome xx must behave like

limNPx=limN1N=0\lim_{ N \to \infty } P_{x}=\lim_{ N \to \infty } \frac{1}{N}=0

We instead define the probability that XX will assume a value in a certain interval [x1,x2][x_{1},x_{2}]. We represent this probability as an integral over the interval [x1,x2][x_{1},x_{2}]:

P(x1Xx2)=x1x2f(x)dxP(x_{1}\leq X\leq x_{2})=\int_{x_{1}}^{x_{2}}f(x)dx

The function f(x)f(x) is called the Probability density function of the distribution. For a more general space (not necessarily one dimensional), the probability of XX being in a subset AΩA\subset\Omega is

P(XA)=xAdp(x)=all spaceIA(x)dp(x)P(X\in A)=\int_{x \in A}dp(x)=\int _\text{all space}\mathbf{I}_{A}(x)dp(x)

where dp(x)dp(x) is the infinitesimal probability, which is often but not always equal to f(x)dxf(x)dx, and IA\mathbf{I}_{A} is the indicator function, defined as1

IA={1xA0otherwise\mathbf{I}_{A}=\begin{cases} 1 & x \in A \\ 0 & \text{otherwise} \end{cases}

The most common continuous probability distribution is the Gaussian distribution.

Footnotes

  1. IA\mathbf{I}_{A} is functionally just a different way to represent an integral over AA. The benefit is that you actually integrate over all space and IA\mathbf{I}_{A} cuts out the parts that don't matter. This might make the integral nicer for analytical manipulation, as N-dimensional integrals over all space are usually easy to rewrite as nested 1D integrals.