Have You Met… LDA!

In my senior year of college, I was lucky enough to conduct an independent research project under the guidance of my advisor, Professor Lynne Butler at Haverford College, to explore one of the coolest statistical models in digital humanities – Latent Dirichlet Allocation (LDA).

A lot of people asked me how the project was motivated, but things actually fell into place naturally.

Lynne briefly introduced LDA to all her potential advisees before the official advisor assignment and expressed her interest in this model – “suppose you are given a corpus of documents, LDA can help you figure out the topics that each document in that corpus was related to”. Those might not be her exact words, but I was sold immediately and thought of maybe applying LDA to corpora in French, since I’ve always enjoyed French literature. So when Lynne brought up to me the idea of comparing the results of LDA on a French corpus and its English counterpart, I couldn’t wait to put the great idea into words for my thesis topic proposal submitted to the department – for I already envisioned how I would enjoy doing the project. And next thing I knew, I started my senior year with my dream advisor and a dreamy thesis topic, living the dream of doing a research that magically connects both statistics and French literature.

More than one year after I submitted my 35-page thesis, I still feel the need to start a blog to relive the unforgettable experience or even take this opportunity to extend the past exploration with my growing knowledge in statistics, while I’m pursuing my master’s degree at Columbia.

While I do believe every piece of information on the Internet is made to  target certain audience, I also wish whoever wander and end up here can find a small piece of information useful.

Leave a comment