Now that we know what a likelihood is we turn our focus to Maximum Likelihood Estimator (MLE) which simply maximizes the likelihood function.

We will use the likelihood of a normal distribution to find the optimal values for mean \(\mu\) and standard deviation \(\sigma\) given an observation x, i.e., \(L(\mu, \sigma~|~x)\).

Let’s start with MLE for \(\mu\). If we fix \(\sigma\) as given and plug in many \(\mu\) to our normal likelihood function, we get multiple \(\mu\) values i.e., \[L(\mu's~|~\sigma, x)\] Then we check which \(\mu\) gives the maximum likelihood. Note that the peak is when the slope of the curve is 0.

We do the same for \(\sigma\) this time by fixing \(\mu\): \[L(\sigma's~|~\mu, x)\]

Now that we know the likelihood functions for both of the normal parameters \(\mu\) and \(\sigma\) for a single observation, we expand this logic to n observations. Because each sample is independent, the likelihood function for n samples is:

\[\begin{aligned} L(\mu, \sigma~|~x_1, x_2, ..., x_n) &=L(\mu, \sigma~|~x_1)...L(\mu, \sigma~|~x_n) \\ &=\frac{1}{\sqrt{2\pi\sigma^2}}e^{\frac{(x_1-\mu)^2}{2\sigma^2}} ...\frac{1}{\sqrt{2\pi\sigma^2}}e^{\frac{(x_n-\mu)^2}{2\sigma^2}} \end{aligned}\]

MLE for \(\mu\) is finding the derivative of L with respect to \(\mu\) and solving it for \(\mu\). One trick we can use is to take the derivative of log L instead of L as it makes the calculation much easier. Note that the peaks for \(\mu\) and \(\sigma\) of both functions are identical.

\[\begin{aligned} ln[L(\mu, \sigma~|~x_1, x_2, \dots, x_n)] &= ln[L(\mu, \sigma~|~x_1)]+ \dots +ln[L(\mu, \sigma~|~x_n)] \\ &= ln(\frac{1}{\sqrt{2\pi\sigma^2}}e^{\frac{(x_1-\mu)^2}{2\sigma^2}}) + \dots + ln(\frac{1}{\sqrt{2\pi\sigma^2}}e^{\frac{(x_n-\mu)^2}{2\sigma^2}})\\ &= (-\frac{1}{2}ln(2\pi\sigma^2)-\frac{(x_1-\mu)^2}{2\sigma^2}) + \dots + (-\frac{1}{2}ln(2\pi\sigma^2)-\frac{(x_n-\mu)^2}{2\sigma^2})\\ &= -\frac{1}{2}ln(2\pi)-\frac{1}{2}ln\sigma^2-\frac{(x_1-\mu)^2}{2\sigma^2} - \dots- \frac{1}{2}ln(2\pi)-\frac{1}{2}ln\sigma^2-\frac{(x_n-\mu)^2}{2\sigma^2}\\ &= -\frac{n}{2}ln(2\pi)-nln\sigma-\frac{1}{2\sigma^2}\sum_{i=1}^{n}(x_i-\mu)^2 \end{aligned}\]

Hence, the derivative we will take with respect to \(\mu\) is:

\[-\frac{n}{2}ln(2\pi)-nln\sigma-\frac{1}{2\sigma^2}\sum_{i=1}^{1}(x_i-\mu)^2~i.e., \]

\[\begin{aligned} &\frac{\partial}{\partial \mu} ln[L(\mu, \sigma~|~x_1, x_2, \dots , x_n)] \\ =& \frac{1}{\sigma^2}(\sum_{i=1}^{n}x_i-n\mu) = 0 \\ &=>~ \mu=\frac{1}{n}\sum_{i=1}^{n}x_i \end{aligned}\]

Thus, MLE for \(\mu\) equals the mean of the data which is the center of normal distribution.

Next, we take the derivative with respect to \(\sigma\)~:

\[\begin{aligned} &\frac{\partial}{\partial \sigma} ln[L(\mu, \sigma~|~x_1, x_2, \dots , x_n)] \\ &= -\frac{n}{\sigma}+\frac{1}{\sigma^3}\sum_{i=1}^{n}(x_i-\mu)^2 = 0 \\ &=>~ \sigma = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(x_i-\mu)^2} \end{aligned}\]

Thus, MLE for \(\sigma\) equals the standard deviation of the data which is the widness of normal distribution.