Cramer-Rao inequality


The Cramer-Rao or Cramer-Rao-Frechèt inequality gives a lower bound to the variance of an estimator. Given an unbiased estimator θ^\hat{\theta} for a population parameter θ\theta , its variance is bounded by

σ^θ^21E[d2logLdθ2]\hat{\sigma}^{2}_{\hat{\theta}}\geq\frac{1}{\mathrm{E}\left[ - \dfrac{d^{2}\log \mathcal{L}}{d\theta ^{2}} \right]}

where E\mathrm{E} is the expected value and L\mathcal{L} is the twice-differentiable likelihood function evaluated over the same sample of the estimator. The inequality is saturated when

logLθ=k(θ^θ)\frac{ \partial \log \mathcal{L} }{ \partial \theta } =k(\hat{\theta}-\theta)

where kk is either a constant or a function of θ\theta. In this case, θ^\hat{\theta} is said to be a minimum-variance estimator or that it achieves the Cramer-Rao lower bound (CRLB).

The inequality has two generalization. For one, we can assume the estimator is biased (with bias bb). For the other, we can assume that instead of estimating a parameter θ\theta, we are estimating a function τ(θ)\tau(\theta) of the parameter. The general estimator therefore is τ^\hat{\tau} with E[τ^]=τ(θ)+b(θ)\mathrm{E}[\hat{\tau}]=\tau(\theta)+b(\theta). With this estimator, the inequality becomes

σ^τ^2(dτdθ+dbdθ)2E[d2logLdθ2]\hat{\sigma}^{2}_{\hat{\tau}}\geq \frac{\left( \dfrac{d\tau}{d\theta}+ \dfrac{db}{d\theta} \right)^{2}}{\mathrm{E}\left[ - \dfrac{d^{2}\log \mathcal{L}}{d\theta ^{2}} \right]}

This form goes back to the previous one when the estimator is unbiased (b=0b=0) or constant with respect to θ\theta (db/dθ=0db/d\theta=0) and τ\tau is the identity function (τ(θ)=θ\tau(\theta)=\theta).

> using the definition of expected value over multiple dimensions. $f(\mathbf{x};\boldsymbol{\phi})$ is the [[joint distribution function]] of the sample, where $\boldsymbol{\phi}$ is the set of parameters that defines it. $\boldsymbol{\phi}$ includes $\theta$, with the others (if any) are assumed fixed. Hence, we write $f(\mathbf{x};\theta)$ instead to clarify that $f$ is a function of $\theta$ only. Furthermore, since the sample is fixed and we are analyzing the parameter (and its estimator), it is useful to reinterpret the JDF as the [[likelihood]] function $\mathcal{L}(\theta;\mathbf{x})$. Finally, for brevity, we write > $$\int_{\Omega_{N}}\ldots \int_{\Omega_{1}}\equiv \int_{\boldsymbol{\Omega}}\quad\text{and}\quad dx_{1}\ldots dx_{N}\equiv d\mathbf{x}

With all this, we can write

> We want to calculate the variance of $\hat{\tau}$, so by definition $\sigma ^{2}_{\hat{\tau}}=\mathrm{E}[(\hat{\tau}-\tau)^{2}]$. To start, we find the derivative > $$\frac{d\mathrm{E}[\hat{\tau}]}{d\theta}=\int_{\Omega}\hat{\tau}\frac{ d \mathcal{L} }{ d \theta }d\mathbf{x}=\int_{\boldsymbol{\Omega}} \hat{\tau}\frac{ d \log \mathcal{L} }{ d \theta }\mathcal{L}\ d\mathbf{x}=\mathrm{E}\left[ \hat{\tau} \frac{ d \log \mathcal{L} }{ d \theta } \right]

But also

> Since $\mathcal{L}$ is a JDF, it must be normalized, which means $\int_{\boldsymbol{\Omega}}\mathcal{L}(\theta;\mathbf{x})d\mathbf{x}=1$. Taking the $\theta$-derivative of this condition yields > $$\int_{\boldsymbol{\Omega}}\frac{ d \mathcal{L} }{ d \theta }d\mathbf{x} =\int_{\boldsymbol{\Omega}}\frac{ d \log\mathcal{L} }{ d \theta }\mathcal{L}\ d\mathbf{x}=0\tag{Norm}

Since τ\tau (not τ^\hat{\tau}!) is a function of only θ\theta, we can multiply both sides by it to get

> The difference between the previous expectation and this one is (using the linearity of the expectation operator): > $$\mathrm{E}\left[ (\hat{\tau}-\tau)\frac{ d \log \mathcal{L} }{ d \theta } \right]=\frac{d\tau}{d\theta}+ \frac{db}{d\theta}

Invoking the Schwarz inequality we can state that E[XY]2E[X2]E[Y2]\mathrm{E}[XY]^{2}\leq \mathrm{E}[X^{2}]\mathrm{E}[Y^{2}], so

> and so, recognizing the variance, we can write > $$\sigma ^{2}_{\hat{\tau}}\geq\frac{\left( \dfrac{d\tau}{d\theta}+ \dfrac{db}{d\theta} \right)^{2}}{\mathrm{E}\left[ \left( \dfrac{ d \log\mathcal{\mathcal{L}} }{ d \theta } \right)^{2} \right]}

From equation (Norm)(\text{Norm}), we can take the θ\theta-derivative to find

\int_{\boldsymbol{\Omega}}\left[ \frac{ d \mathcal{L} }{ d \theta } \frac{ d \log \mathcal{L} }{ d \theta } +\mathcal{L}\frac{ d ^{2}\log \mathcal{L} }{ d \theta ^{2} } \right]&=\int_{\boldsymbol{\Omega}}\mathcal{L}\left[ \mathcal{L}\left( \frac{ d \log \mathcal{L} }{ d \theta } \right)^{2}+\mathcal{L}\frac{ d ^{2}\log \mathcal{L} }{ d \theta ^{2} } \right] \\ &=\mathrm{E}\left[ \left( \frac{ d \log \mathcal{L} }{ d \theta } \right)^{2} \right]-\mathrm{E}\left[ - \frac{ d ^{2}\log \mathcal{L} }{ d \theta ^{2} } \right]=0 \end{align}
> We can thus state and > $$\mathrm{E}\left[ \left( \frac{ d \log \mathcal{L} }{ d \theta } \right)^{2} \right]=\mathrm{E}\left[ - \frac{ d ^{2}\log \mathcal{L} }{ d \theta ^{2} } \right]

and hence finally

>whichcompletesourproof.> which completes our proof.