The moment-generating function of the multivariate normal is known in closed form for the independent variable case and is a direct extension of the univariate normal:
The multivariate normal distribution is, like its univariate sibling, possibly the most important multivariate distribution, owing to the commonness of Gaussian random variables. Both the cases with independent and dependent random variables come up in practice, and it's therefore useful to discuss both.
This effectively reduces the covariance matrix down to a variance vector.
The exponent Q2 mentioned above is useful because its values trace hypersurfaces of equal probability. If you set a value of Q2 and find the locus of (x1,…,xN) points that realize that value, you get a closed and quite well-behaved hypersurface that represents all points with equal probability. If this sounds confusing, just know that in the simplest case of N=2, they're just regular ellipses. To see it, just write down the equation for Q2:
Q2=σ12(x1−μ1)2+σ22(x2−μ2)2
This is the equation of an ellipse with semiaxes σ1 and σ2, centered in (μ1,μ2), scaled by a factor Q.1 For higher N, they are hyperellipsoids. When they can be represented visually (mostly just N=2, possibly N=3), they serve as a useful way to visualize the dispersion of the multivariate normal. In fact, exactly because the standard deviations are the semiaxes, Q2 is essentially the N-dimensional analog of the variance and Q that of the standard deviation. Of course, the real variances are still σ12,…,σN2, but Q2 is an efficient way to package all that information in a single variable. In the N=2 case, since the semiaxes are scaled by Q, setting Q lets us draw the ellipse corresponding to whatever multiple of σ we want. For Q2=1, we get the ellipse with (σ1,σ2), for Q2=4, we get the ellipse with (2σ1,2σ2), and so on.
The area inside the hyperellipsoid is used to measure probability in the same way a cumulative distribution function does. The probability of a value being inside of a Q2=1 hyperellipsoid is P(Q2≤1). To properly evaluate this, we need the CDF of Q2. As it happens, Q2 is actually chi-square distributed, specifically with N degrees of freedom. This is because Q2 is the sum of squares of x1,…,xN, and the sum of squares of Gaussians follows a χN2 distribution. Thus, we can find the probability just fine, even analytically. For an N=2 ellipse, the χ22 is just
These numbers are the multivariate analog of the classic 1-2-3σ rule of the Gaussian. Where 1σ⇒68% and 2σ⇒95% in a univariate Gaussian, Q2=1⇒39% and Q2=4⇒86% in a multivariate Gaussian.
If the variables are not independent, things gets a lot more verbose as covariance can no longer be ignored. Let's consider N=2 variables only. We can no longer use the easy formula for Q2 and need the full one:
Q2=(x−μ)TΣ−1(x−μ)
In other words, we need the covariance matrix and its inverse:
If we find the draw the ellipses traced by this Q2, we find that they are now angled. This angle is quite important, as it visually represents correlation between x1 and x2. Indeed, this angle is proportional to the correlation coefficient ρ and is the major difference from the independent case above (where the ellipse was axis-aligned). When ρ=0, you get an axis-aligned ellipse.
Graph Multinormal equiprobability.svg|80%|center
A general bivariate ellipse showing important points and lines. The red diagonals (passing through segments CC′ and DD′) are known as the regression lines of the distribution.
We can get extract a lot of useful information with some geometry. The x1=μ1 and x2=μ2 lines intersect with the ellipse at points