A chi-square test or -test is a type of hypothesis test on a sample of Gaussian random variables. It is a Gaussian parameter test, but it's common enough to warrant it's own article. It's popularity is due to the commonness of Gaussian samples, alongside the simplicity and usefulness of the test.
Suppose the sample is composed of Gaussian random variables, not necessarily iid, of which we know the variances , and we want to test that means are equal to some values we're looking for, generally ones we expect from theory. Our hypotheses are
We ask that each distribution individually has a mean equal to some value . It is (almost) like simultaneous Z-tests for separate Gaussians. Instead of testing that a mean equals a test mean, you are testing that a vector of means equals a vector of test means. The test statistic is
which follows a chi-square distribution with degrees of freedom. The critical region is defined as a one-tailed region for high values of .
The test differs from different applications of the Z-test because it does not require the Gaussian RVs to be iid and also considers correlations. They can be differently distributed, so that all Gaussian distributions have different and parameters, and the test still applies. They also don't need to be independent; in fact, if the RVs are dependent on each other, the test still holds as long as we use the more general test statistic
where is the covariance matrix of the RVs, and . It also follows the distribution and reduces to the previous form in the case independent variables.
Despite its popularity, the test has a few weakness:
- It requires the variances to be known. If only the mean is known, then a parameter test for the mean might be more appropriate.
- It is blind to the sign of the deviation from the mean. If the null hypothesis is rejected, the test can't tell you if the sample mean is an over- or an underestimate.
- Finally, it's a rather low power (confidence) test for low due to having broad tails.
Applications#
> The RV behind this number follows a $\chi_{N} ^{2}$, as outlined above. If $\mu_{0}$ is not known, then you can substitute the sample mean $\bar{x}$: > $$t=\sum_{i=1}^{N} \frac{(x_{i}-\bar{x})^{2}}{\sigma_{i}^{2}}> We're taking the test means to be the values of our expected relation. Then, run the test using the statistic in $(1)$, using $f(x_{i};\mathbf{a})$ instead of $\mu_{0,i}$. If the null hypothesis is accepted, your formula is valid. > > Note that the degrees of freedom of $\chi ^{2}$ are $N$ only if the parameters $\mathbf{a}$ are known from source external from your measurements. If you instead calculate $\mathbf{a}$ from the same values $\{ x_{1},\ldots,x_{N} \}$ that you are testing, the $\chi ^{2}$ actually has $N-k$ degrees of freedom, as you are imposing $k$ constraints due to the estimates. Since $\chi ^{2}$ tests become less confident with less degrees of freedom, this makes your test a bit less reliable. > [!example]- Pearson's goodness-of-fit test > The $\chi ^{2}$ test can also quantify how well an empirical distribution matches an expected distribution. Say you have a sample of $x_{1},\ldots,x_{N}$ data points sampled from a RV $X$. You [[histogram]] them to see their empirical distribution. The histogram has $\nu$ bins, each containing $n_{1},\ldots,n_{\nu}$ samples. The expected occupation of a bin in a histogram can be approximated with the [[binomial distribution]], as long as the [[Probability density function|PDF]] $f_{X}(x;\boldsymbol{\theta})$ is known (which it should be, since you're choosing it). The expected occupation is > $$\mathrm{E}[n_{i}]=\mu_{n_{i}}\simeq Np_{i}The RV behind this number instead follows , because using the sample mean removes one degree of freedom. If the null hypothesis is rejected, not all measurements are compatible, as for some .
> Here $x_{i}$ and $x_{i+1}$ are the left and right edges of the bin, $\Delta_{i}=\lvert x_{i+1}-x_{i} \rvert$ is the width of the bin and $x_{i}^{*}=(x_{i+1}-x_{i})/2$ is the center of the bin. Now we are back in the state of a typical $\chi ^{2}$ test, but instead of testing the means of the random sample, we are testing the means of the *bins* made from the random sample: > $$H_{0}:n_{i}=\mu_{n_{i}}\;\forall iwhere is the total probability of falling in a bin, as given by the PDF. It can be found either from the CDF or approximated from the PDF if the bins are thin enough:
> Be careful to sum over $\nu$ bins, not $N$ samples like the other $\chi ^{2}$ tests! This statistic follows a $\chi ^{2}$ distribution. The degrees of freedom are $\nu-1$ if the parameters $\boldsymbol{\theta}=(\theta_{1},\ldots,\theta_{k})$ of $f_{X}(x;\boldsymbol{\theta})$ are known independently. If they are estimated from the test sample, the degrees are $\nu-1-k$ because you set $k$ constraints for the estimation. If you accept the null hypothesis, then the data follows the distribution $f_{X}$ you selected. > [!example]- Independence test > The $\chi ^{2}$ test can also be used to determine the independence of two quantities. Say you have a sample of data $(x_{1},y_{1}),\ldots,(x_{N},y_{N})$ from two RVs $X$ and $Y$. For instance, this could be a sample of $N$ people's heights ($x$) and weights ($y$) and you want to see if the two are independent or not. Our metric of independence uses the fact that the [[joint distribution function]] of two independent variables is just the product of the two: > $$f_{XY}(x,y)=f_{X}(x)f_{Y}(y)The test statistic is
> The probability can estimated from the data itself using the frequentist interpretation (observed events over total events), which involves finding the occupation of each row and column and dividing by the sample size. For example, the probability of a value of $X$ falling in the $j$-th column (first height bin) is $p_{:j}=n_{:j}/N=(n_{1j}+n_{2j}+\ldots+n_{\nu j})/N$, where the subscript $:j$ indicates the sum of all values of the $j$-th column. Similarly, for the $i$-th row, $p_{i:}=n_{i:}/N=(n_{i1}+n_{i2}+\ldots+n_{i\kappa})/N$. Then, the mean bin occupations are > $$\mu_{n_{ij}}=N(p_{i:}+p_{:j})=N \frac{n_{i:}}{N} \frac{n_{:j}}{N}=\frac{n_{i:}n_{:j}}{N}You then essentially just apply the Pearson's goodness-of-fit test to see if the product of and as found empirically is indeed . You proceed in a similar manner, but you have to work with a multidimensional histogram. In the example case, a 2D histogram with height bins and weight bins. This kind of histogram can be represented as an table or matrix where each cell is the occupation of the bin. Similarly to the Pearson test, we use the estimates for expected bin counts:
> This has a $\chi ^{2}$ distribution of course, and has degrees of freedom $(\nu-1)(\kappa-1)$. If the null hypothesis is accepted, the two quantities are independent. > > In principle, extending this to testing the independence of more than two variables at the same time is easy. You just need one dimension per variable in the histogram, then calculate the probability and the test statistic in the same way, and add one more $(\text{bins}-1)$ factor to the degrees of freedom. In practice, this is problematic in all sorts of ways. For one, high-dimensional data explodes in complexity very fast and is incredibly difficult to visualize. Moreover, it's easier to make mistakes in indexing, both by hand and in programming. Finally, histograms suffer the "curse of dimensionality", which makes the bins become very sparse as the dimensions go up, which messes with the CLT assumption for the Poisson distribution. In general, it is possibly more sensible to only independence-test two variables at once because of these reasons.If the probability of falling in any given bin is small, you can use the Poisson distribution to state that . Moreover, since the Poisson distribution obeys the central limit theorem, if the bin counts are large enough (at least 5, ideally more), the bin occupation distribution becomes approximately Gaussian. The test therefore becomes a regular Pearson's goodness-of-fit test extended to multiple dimensions: