The Cramer-Rao or Cramer-Rao-Frechèt inequality gives a lower bound to the variance of an estimator. Given an unbiased estimator for a population parameter , its variance is bounded by
where is the expected value and is the twice-differentiable likelihood function evaluated over the same sample of the estimator. The inequality is saturated when
where is either a constant or a function of . In this case, is said to be a minimum-variance estimator or that it achieves the Cramer-Rao lower bound (CRLB).
The inequality has two generalization. For one, we can assume the estimator is biased (with bias ). For the other, we can assume that instead of estimating a parameter , we are estimating a function of the parameter. The general estimator therefore is with . With this estimator, the inequality becomes
This form goes back to the previous one when the estimator is unbiased () or constant with respect to () and is the identity function ().
> using the definition of expected value over multiple dimensions. $f(\mathbf{x};\boldsymbol{\phi})$ is the [[joint distribution function]] of the sample, where $\boldsymbol{\phi}$ is the set of parameters that defines it. $\boldsymbol{\phi}$ includes $\theta$, with the others (if any) are assumed fixed. Hence, we write $f(\mathbf{x};\theta)$ instead to clarify that $f$ is a function of $\theta$ only. Furthermore, since the sample is fixed and we are analyzing the parameter (and its estimator), it is useful to reinterpret the JDF as the [[likelihood]] function $\mathcal{L}(\theta;\mathbf{x})$. Finally, for brevity, we write > $$\int_{\Omega_{N}}\ldots \int_{\Omega_{1}}\equiv \int_{\boldsymbol{\Omega}}\quad\text{and}\quad dx_{1}\ldots dx_{N}\equiv d\mathbf{x}> We want to calculate the variance of $\hat{\tau}$, so by definition $\sigma ^{2}_{\hat{\tau}}=\mathrm{E}[(\hat{\tau}-\tau)^{2}]$. To start, we find the derivative > $$\frac{d\mathrm{E}[\hat{\tau}]}{d\theta}=\int_{\Omega}\hat{\tau}\frac{ d \mathcal{L} }{ d \theta }d\mathbf{x}=\int_{\boldsymbol{\Omega}} \hat{\tau}\frac{ d \log \mathcal{L} }{ d \theta }\mathcal{L}\ d\mathbf{x}=\mathrm{E}\left[ \hat{\tau} \frac{ d \log \mathcal{L} }{ d \theta } \right]With all this, we can write
> Since $\mathcal{L}$ is a JDF, it must be normalized, which means $\int_{\boldsymbol{\Omega}}\mathcal{L}(\theta;\mathbf{x})d\mathbf{x}=1$. Taking the $\theta$-derivative of this condition yields > $$\int_{\boldsymbol{\Omega}}\frac{ d \mathcal{L} }{ d \theta }d\mathbf{x} =\int_{\boldsymbol{\Omega}}\frac{ d \log\mathcal{L} }{ d \theta }\mathcal{L}\ d\mathbf{x}=0\tag{Norm}But also
> The difference between the previous expectation and this one is (using the linearity of the expectation operator): > $$\mathrm{E}\left[ (\hat{\tau}-\tau)\frac{ d \log \mathcal{L} }{ d \theta } \right]=\frac{d\tau}{d\theta}+ \frac{db}{d\theta}Since (not !) is a function of only , we can multiply both sides by it to get
> and so, recognizing the variance, we can write > $$\sigma ^{2}_{\hat{\tau}}\geq\frac{\left( \dfrac{d\tau}{d\theta}+ \dfrac{db}{d\theta} \right)^{2}}{\mathrm{E}\left[ \left( \dfrac{ d \log\mathcal{\mathcal{L}} }{ d \theta } \right)^{2} \right]}Invoking the Schwarz inequality we can state that , so
> We can thus state and > $$\mathrm{E}\left[ \left( \frac{ d \log \mathcal{L} }{ d \theta } \right)^{2} \right]=\mathrm{E}\left[ - \frac{ d ^{2}\log \mathcal{L} }{ d \theta ^{2} } \right]From equation , we can take the -derivative to find
\int_{\boldsymbol{\Omega}}\left[ \frac{ d \mathcal{L} }{ d \theta } \frac{ d \log \mathcal{L} }{ d \theta } +\mathcal{L}\frac{ d ^{2}\log \mathcal{L} }{ d \theta ^{2} } \right]&=\int_{\boldsymbol{\Omega}}\mathcal{L}\left[ \mathcal{L}\left( \frac{ d \log \mathcal{L} }{ d \theta } \right)^{2}+\mathcal{L}\frac{ d ^{2}\log \mathcal{L} }{ d \theta ^{2} } \right] \\ &=\mathrm{E}\left[ \left( \frac{ d \log \mathcal{L} }{ d \theta } \right)^{2} \right]-\mathrm{E}\left[ - \frac{ d ^{2}\log \mathcal{L} }{ d \theta ^{2} } \right]=0 \end{align}
and hence finally