Applied Mechanics Lab

\[\newcommand{\u}[1]{\boldsymbol{\mathsf{#1}}} \renewcommand{\b}[1]{\boldsymbol{#1}} \newcommand{\t}[1]{\textsf{#1}} \newcommand{\m}[1]{\mathbb{#1}} \def\RR{\bf R} \def\bold#1{\bf #1} \def\mbf#1{\mathbf #1} \def\uv#1{\hat{\usf {#1}}} \def\dl#1{\underline{\underline{#1}}} \newcommand{\usf}[1]{\boldsymbol{\mathsf{#1}}} \def\bs#1{\usf #1}\]

Calculus of Variations: Maximize Entropy

Let $X$ be a random variable taking values in the real line. The probability that $X$ takes a value less than or equal to a given real number $x$ is obtained by integrating the probability density function (pdf) $\rho$:

\[P(X \leq x) = \int_{-\infty}^x \rho(y) dy .\]

Since $X$ must take some value, we have that

\[\int_{-\infty}^{\infty} \rho(y) dy = 1 .\]

In many problems, one is interested in determining the probability density $\rho$ based on knowledge of certain expectation values. For instance, suppose that we know that the variance of $\rho$ is given by $\sigma^2$, for some $\sigma \in \mathbb{R}$. In other words, we know that

\[\sigma^2 = \int_{\mathbb{R}} x^2 \rho(x) dx .\]

We would like to find a pdf that is the least biased. The answer is provided by a variational principle called “Principle of Maximum Entropy”. This principle states that $\rho(x)$ is obtained by maximizing the Entropy

\[S[\rho(\cdot)] =\int_{\mathbb{R}} \rho(x) \ln(\rho) dx ,\]

subject to the constraints

\[\int_{\mathbb{R}} \rho(x) dx = 1,\]

and

\[\sigma^2 = \int_{\mathbb{R}} x^2 \rho(x) dx .\]

The variational problem we need to solve is : Minimize the functional

\[\hat{I} [\rho(\cdot)] = \int_{\mathbb{R}} \rho \ln(\rho) dx + \lambda_1 \left(\int_{\mathbb{R}} \rho(x) dx - 1\right) + \lambda_2 \left( \int_{\mathbb{R}} x^2 \rho(x) dx - \sigma^2\right).\]

The Augmented Lagrangian:

\[\hat{L}(x,\rho, \rho') = \rho \ln(\rho) + \lambda_1 \rho(x) + \lambda_2 x^2 \rho(x) .\]

The Euler-Lagrange Equation:

Recall that the Euler-Lagrange Equation is calculated by

\[\frac{\partial \hat{L}}{\partial \rho} - \frac{d}{dx}\left( \frac{\partial \hat{L}}{\partial \rho'} \right) = 0.\]

In this case,

\[\ln(\rho) + 1 + \lambda_1 + \lambda_2 x^2 = 0 .\]

We define $\hat{\lambda}_1 = \lambda_1 + 1$, then

\[\rho(x) = e^{\hat{\lambda}_1 + \lambda_2 x^2 } .\]

The first constraint gives

\[\int_{\mathbb{R}} \rho(x) dx = \int_{\mathbb{R}} e^{\hat{\lambda}_1 + \lambda_2 x^2 } dx \\ = e^{\hat{\lambda}_1 } \int_{\mathbb{R}} e^{\lambda_2 x^2 } dx = 1 .\]

For $\lambda_2 \geq 0$, the integral $\int_{\mathbb{R}} e^{\lambda_2 x^2 } dx $ does not converge. Thus we assume that $\lambda_2 = - k^2 < 0$. According to the Gaussian integral $ \int_{-\infty}^{\infty} e^{-x^2} dx = \sqrt{\pi}$,

\[1 = e^{\hat{\lambda}_1 } \int_{-\infty}^{\infty} e^{- k^2 x^2 } dx \\ = \frac{e^{\hat{\lambda}_1 } }{k} \int_{-\infty}^{\infty} e^{- (kx)^2 } d(k x) \\ = \frac{e^{\hat{\lambda}_1 } }{k} \sqrt{\pi}.\]

The second constraint gives (you can work out the integral using Gaussian integral with integration by parts or simply Mathematica)

\[\sigma^2 = \int_{\mathbb{R}} x^2 e^{\hat{\lambda}_1 + \lambda_2 x^2 } dx \\ = e^{\hat{\lambda}_1} \frac{\sqrt{\pi}}{2 k^3} .\]

Recall the previous equation

\[\frac{e^{\hat{\lambda}_1 } }{k} \sqrt{\pi} = 1.\]

We can solve for $k$ by dividing the above two equations

\[k^2 = \frac{1}{2 \sigma^2}\]

Then we solve for $e^{\hat{\lambda}_1}$ as

\[e^{\hat{\lambda}_1} = \frac{1}{\sqrt{2 \pi \sigma^2}} .\]

The probability density function

\[\rho(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{x^2}{2 \sigma^2}} .\]

This is called the normal distribution.

[edit]