Entropy from Lagrange multipliers


The second law of thermodynamics can be described by using the Lagrange multipliers method to maximize entropy.

Consider an ensemble with information-theoretical entropy S=kBxp(x)logp(x)S=-k_{B}\sum_{x}p(x)\log p(x), where p(x)p(x) is the Probability that the ensemble is in state xx. As usual, xp(x)=1\sum_{x}p(x)=1. Its internal energy is U=E=xE(x)p(x)U=\langle E \rangle =\sum_{x}E(x)p(x)1. The constraint functions are

g1(x)=1xp(x)g_{1}(x)=1-\sum_{x}p(x)

which determines the completeness of probabilities and

g2(x)=UxE(x)p(x)g_{2}(x)=U-\sum_{x}E(x)p(x)

which determines the internal energy.

The Lagrangian L\mathcal{L} for these constraints is

L(x)=S(x)+λ1g1(x)+λ2g2(x)=kBxp(x)logp(x)+λ1(1xp(x))+λ2(Uxp(x)E(x))\begin{align} \mathcal{L}(x) &= S(x) +\lambda_{1}g_{1}(x) +\lambda_{2}g_{2}(x) \\ &=-k_{B}\sum_{x}p(x)\log p(x)+\lambda_{1}\left( 1-\sum_{x}p(x) \right)+\lambda_{2}\left( U-\sum_{x}p(x)E(x) \right) \end{align}

where kBk_{B} is the Boltzmann constant. The Lagrange multiplier theorem tells us that if some value xˉ\bar{x} is a maximum of SS, then there exist specific values of λ1\lambda_{1} and λ2\lambda_{2} such that xˉ\bar{x} is a stationary point for L\mathcal{L}:

If S(xˉ) is a maximumL(xˉ;λ1,λ2)=0\text{If }S(\bar{x})\text{ is a maximum}\quad\Rightarrow \quad \nabla \mathcal{L}(\bar{x};\lambda_{1},\lambda_{2})=0

In our case, L\mathcal{L} is univariate, so the Gradient is just the derivative in pp:

dLdp=0=ddp(kBxp(x)logp(x))+λ1ddp(1p(x))+λ2ddp(Uxp(x)E(x))=kBxddp(plogp)λ1xddppλ2xddppE=kBx(logp+1)λ1x1λ2xE=x[kBlogp(x)+kB+λ1+λ2E(x)]\begin{align} \frac{d\mathcal{L}}{d p}&=0 \\ &=\frac{d}{dp}\left( -k_{B}\sum_{x}p(x)\log p(x) \right)+ \lambda_{1}\frac{d}{dp}\left(1 -\sum p(x) \right)+ \lambda_{2}\frac{d}{dp}\left( U-\sum_{x}p(x)E(x) \right) \\ &=-k_{B}\sum_{x} \frac{d }{d p }(p\log p)-\lambda_{1}\sum_{x} \frac{d}{dp} p-\lambda_{2}\sum_{x}\frac{ d }{ dp } pE \\ &=-k_{B}\sum_{x}(\log p+1)-\lambda_{1}\sum_{x}1-\lambda_{2}\sum_{x}E \\ &=-\sum_{x}[k_{B}\log p(x)+k_{B}+\lambda_{1}+\lambda_{2}E(x) ] \end{align}

Each term in the sum must individually be zero because they are all independent from each other:

0=kBlogp(x)+kB+λ1+λ2E(x)0=k_{B}\log p(x)+k_{B}+\lambda_{1}+\lambda_{2}E(x)

Extracting logp(x)\log p(x) we get

logp(x)=kB+λ1+λ2E(x)kB=kBλ1λ2E(x)kB\log p(x)=\frac{k_{B}+\lambda_{1}+\lambda_{2}E(x)}{-k_{B}}= \frac{-k_{B}-\lambda_{1}-\lambda_{2}E(x)}{k_{B}}

Therefore

p(x)=e(kBλ1λ2E(x))/kBp(x)=e^{(-k_{B}-\lambda_{1}-\lambda_{2}E(x))/k_{B}}

This equation must satisfy probability normalization:

xp(x)=1=xe(kBλ1)/kBeλ2E(x)/kB=e(kBλ1)/kBZ\sum_{x}p(x)=1=\sum_{x}e^{(-k_{B}-\lambda_{1})/k_{B}}e^{-\lambda_{2}E(x)/k_{B}}=e^{(-k_{B}-\lambda_{1})/k_{B}}Z

where we introduced the partition function ZZ

Zxeλ2E(x)/kBZ\equiv\sum_{x}e^{-\lambda_{2}E(x)/k_{B}}

We can therefore write

p(x)=1Zeλ2E(x)/kBp(x)=\frac{1}{Z} e^{-\lambda_{2}E(x)/k_{B}}

Note how the probability depends only on the second multiplier. Now that we know the probabilities, we can calculate entropy directly

S=kBxp(x)logp(x)=kBxp(x)[logZλ2E(x)kB]=kBlogZxp(x)1+λ2xp(x)E(x)U=kBlogZ+λ2U\begin{align} S&=-k_{B}\sum_{x}p(x)\log p(x) \\ &=-k_{B}\sum_{x}p(x)\left[ -\log Z- \frac{\lambda_{2}E(x)}{k_{B}} \right] \\ &=k_{B}\log Z\underbrace{ \sum_{x}p(x) }_{ 1 }+\lambda_{2}\underbrace{ \sum_{x}p(x)E(x) }_{ U } \\ &=k_{B}\log Z+\lambda_{2}U \end{align}

We want to determine what λ2\lambda_{2} is. To do this, we can see that

SU=λ2\frac{ \partial S }{ \partial U } =\lambda_{2}

and using the Maxwell relation

T=US1T=SUT=\frac{ \partial U }{ \partial S } \quad\to \quad \frac{1}{T}=\frac{ \partial S }{ \partial U }

we can state

λ2=1T\boxed{\lambda_{2}=\frac{1}{T}}

where TT is the temperature. Therefore, entropy is

S=UT+kBlogZ\boxed{S=\frac{U}{T}+k_{B}\log Z}

Footnotes

  1. EE is actually the Hamiltonian. The usage of EE instead of HH is due to HH being used for entropy in information theory, whereas EE is used for the cost function. Since this is an information theory problem, information theory conventions are favored.