Chain rule


The chain rule is a formula that expresses the derivative of the composition of two functions. Given two real univariate functions f(t)f(t) and g(x)g(x), their composition is h(t)=g(f(t))=(gf)(t)h(t)=g(f(t))=(g\circ f)(t). The derivative of hh is

h(t)=g(f(t))f(t)h'(t)=g'(f(t))f'(t)

This can be extended to arbitrary dimensions. Say now that ff and gg are defined as f:RNRMf:\mathbb{R}^{N}\to \mathbb{R}^{M} and g:RMRLg:\mathbb{R}^{M}\to \mathbb{R}^{L}. The composition is now h:RNRLh:\mathbb{R}^{N}\to \mathbb{R}^{L}. The Differential of hh is given by

Dh(t)=D(gf)(t)=Dg(f(t))Df(t)Dh(t)=D(g\circ f)(t)=Dg(f(t))\cdot Df(t)

The dot operator "\cdot" represents a product of linear operators. If they are given in matrix form (i.e. the Jacobians), it is a matrix multiplication.

For a partial derivative of the aa-th component of hh in the bb-th argument of ff, and calling xjx_{j} the arguments of gg, it reads as

hatb(t)=jgaxjx=f(t)fjtb(t)\frac{ \partial h_{a} }{ \partial t_{b} }(t)=\sum_{j}\left.{\frac{ \partial g_{a} }{ \partial x_{j} }}\right|_{x=f(t)} \frac{ \partial f_{j} }{ \partial t_{b} } (t)

where the vertical bar denotes that the derivative of gag_{a} is evaluated in f(t)f(t).