Readings on text classification

At my work, I’ve been actively working on text classification problems. I began with simple Random Forest based model and now switched to using a hierarchical deep neural network for a domain specific problem. Meanwhile, I’ve been investigating a...

Hidden Markov Models

Recently, I’ve taken up reading on speech recognition and it’s good to refresh some basic models that have been traditionally a part of ASR. There are great videos on fundamentals of speech and signals in the course on Foundations for Speech Processing

Expectation Maximization

This article is about the Expectation Maximization algorithm and the guarantees it offers for certain kind of optimization problems. We’ll walk through the gory mathematical details and work out some examples that involve EM. In this article, we assume...

MLE, Fisher information, and related theory

It wasn’t until I entered CMU that I realized the great debate between the frequentist world and the Bayesian world. In simplest terms, the former models the world assuming constant parameters \( \theta \) while the latter assumes uncertainty in those...

KL Divergence

KL divergence is a premetric that finds its root in information theory. It has a close relationship with Shannon entropy and we’ll walk through this relationship in the subsequent discussion. In its most basic sense, KL divergence measures the proximity...