At my work, I’ve been actively working on text classification problems. I began with simple Random Forest based model and now switched to using a hierarchical deep neural network for a domain specific problem. Meanwhile, I’ve been investigating a...

# Hidden Markov Models

Recently, I’ve taken up reading on speech recognition and it’s good to refresh some basic models that have been traditionally a part of ASR. There are great videos on fundamentals of speech and signals in the course on Foundations for Speech Processing

# MLE, Fisher information, and related theory

It wasn’t until I entered CMU that I realized the great debate between the frequentist world and the Bayesian world. In simplest terms, the former models the world assuming constant parameters $$\theta$$ while the latter assumes uncertainty in those...

# KL Divergence

KL divergence is a premetric that finds its root in information theory. It has a close relationship with Shannon entropy and we’ll walk through this relationship in the subsequent discussion. In its most basic sense, KL divergence measures the proximity...