# MLE, Fisher information, and related theory

It wasn’t until I entered CMU that I realized the great debate between the frequentist world and the Bayesian world. In simplest terms, the former models the world assuming constant parameters $$\theta$$ while the latter assumes uncertainty in those parameters. Maximum likelihood estimate is at the heart of machine learning and hence it deserves a treatment of lucid explanation. At first I wanted to just write a post about Fisher information matrix but then I stumbled across the following piece of writeup by Konstantin Kashin titled Statistical Inference: Maximum Likelihood Estimation. I believe that this piece of writing is self-contained and explains the things well. However, for the benefit of the reader, I have put sidenotes to make some of the derivations clearer. Section 1 and 2 covers the basics of MLE which if you’re a beginner should definitely go through. Intermediate and advanced readers can jump to section 3 for more commentary and notes.