A confidence interval of an estimate is an interval that contains the estimate and conveys the likelihood that the true value is within the interval across repeated measurements. The likelihood of containing the true value is called the confidence level and, importantly, it does not convey the probability that the true value falls in that individual confidence interval. Rather, it refers to how frequently it does across repeated measurements. A 95% confidence level does not say "there is a 95% chance that the true value is in this interval" but rather "across 100 different measurements with 100 different intervals, around 95 of them contain the true value." Confidence intervals are a fundamentally frequentist tool: they require repetition of the same phenomenon to have meaning. As such, they convey confidence on the repeated process more so than the estimate.
When estimating multiple parameters at the same time, their individual intervals can be combined into confidence regions, but they are seldom used in practice.
Construction#
Confidence intervals can be constructed in multiple ways, some more specific than others. The general method requires choosing a confidence level and integrating over a region of the parameter space. In practice, special statistics called pivots can be used.
General method#
Call the population parameter to be estimated and an estimator that we are calculating on some sample. To proceed, we need to know the probability distribution that follows, specifically its probability density function (or probability mass function) . We choose a confidence level by hand: typically, numbers near are chosen, such as , and .1 Once fixed, the interval is determined by the integral
where is a measure of probability. This probability is observed by repeating the experiment multiple times and constructing the interval for each.
The interval bounds are two, but equation is only one. As such, it does not fully define the bounds. One more condition is needed to fully determine the interval. Multiple such conditions exist and depend on the specifics of the problem. A common one is to state that the interval is centered on and symmetric, so that and are equally distant from . Then, the interval (half-)width is the only bound that needs to be estimated.
Pivots#
The construction of a confidence interval typically makes use of a pivot, a statistic of both the estimate and the parameter, whose probability distribution is known.
Let's start with a practical example. A common pivot function is the following, defined for a random sample of iid normal distributions for which we wish to estimate the mean through the sample mean . The variance may be known or unknown. The pivot function changes a bit between these two cases. If is known, we define
which follows a standard normal distribution. Because of this, the probability that a realization of falls in is 68.3%, since that's a well-known property of the standard normal. Therefore, we can construct the confidence interval as with . Viceversa, since the standard normal is known, it is possible to manually define and calculate the scaling factor for the corresponding interval.
If is not known, then we employ the unbiased sample variance as a substitute. The pivot is now defined as
which instead follows a Student's t distribution with degrees of freedom. This pivot is a bit more complicated, but since the PDF of is known, it is possible to calculate the confidence interval in the same way as above. Notably, the Student's t distribution obeys the central limit theorem, so for large , it converges to a normal distribution and the difference between the two pivots shrinks.
More technically, this pivot carries the property
where is quantile of the distribution and . By symmetry arguments, . With some algebra, the previous property can be manipulated to read
Hence, the random interval of bounds
contains the mean with probability .
In practice, given a particular set of data , we calculate the confidence interval by replacing and with their observed values and for the data that we have:
This interval either contains the true value of or it does not, with the probability given by . In other words, given some dataset , there's a chance that the confidence interval defined as above will contain the true value of the mean.
Choosing (or ) is therefore crucial: is arbitrary because it's not an inherent property of the dataset or distribution. It's essentially a push-pull relation between accuracy and guarantees. On one hand, if is very high, then basically every dataset we collect will contain the true value. In theory that sounds great; in practice the price we pay is that the confidence interval is huge. Indeed, what varying does is essentially making the interval larger or smaller. The larger the interval, the more likely the true value is to fall in it, but the more error we accept. On the other hand, small intervals are very accurate, but they have a very high chance of being straight-up wrong. The figures for given before are a good mixture of reliable and "accurate enough". For instance, the figure is wrong about 1 in 20 times.
As for the endpoints, it depends on . It's generally chosen to be symmetrical and , and these kinds of intervals are called equi-tailed, but strictly speaking there's no need. We can generalize to and , with the only necessary condition being . Other notable choices are and . These respectively make the left and right endpoints infinitely far, so that the confidence interval is only bounded to the left or right. These are called one-sided confidence intervals.
Exact confidence intervals are few and far between. Thankfully, approximate intervals are rather easy to find. A common approximate interval is given by the Wald pivot, for some parameter . It is based on a consistent estimator which is approximately standard-normally-distributed for large sample sizes:
for all . is the Standard error. The corresponding confidence interval is between
The benefit of this pivot is that the central limit theorem makes it work in many cases when is the sample mean of each variable.
Footnotes#
-
A notable exception is if is known to follow a Gaussian distribution, in which case is also common, as it's the interval spanned by one standard deviation. ↩