Appendix of Appendix: More on Benchmark Facebook Engagement with Hierarchical Poisson Factorization

About eight months ago, I was desperately hunting for a summer internship. That was THE time for me to explore that thing called “Start-ups”, which believe it or not, I had never heard of before I came to New York. For one of the positions I applied (*spoilers alert*: I didn’t end up getting), I was given a data analysis mini project to earn my entry into an onsite interview with the company.

The project was intended to test applicant’s data collection and analysis skills that were essential to the position, and it came in three parts involving three marketing channels – Facebook, Instagram (social media marketing) and E-mails (outbound sales). This project about Facebook Engagement was inspired by the Facebook part from that project, where I was given a dataset on Facebook page information of six different clothing brands with three activewear brands and three fashion brands. Besides brand information, the data also include Facebook page community size (the number of followers), interaction per-post (a direct sum of like,shares, and comments; and I believe this is how the metric “engagement” is normally defined), post frequency (posts per week), and time stamps by quarter. The dataset itself didn’t give much space for complicated analysis, but the Sportswear vs. Fashion brands setting along with what I learned through the data were definitely calling for further exploration.

Screen Shot 2016-12-26 at 3.08.06 pm.pngScreen Shot 2016-12-26 at 3.08.09 pm.png

We saw a noticeable discrepancy in social media marketing performances between Sportswear and Fashion brands, and it seems that sportswear brands did a much better job in reaching and engaging their potential consumers. This is particularly interesting, because it’s intuitively obvious that the sportswear market is a battlefield – there aren’t that many players in the game, but they are competing to win a tough war, as conventionally sportswear has been considered as a type of occasional clothing (for example, every woman needs a bra, but not every woman is in desperate need of a high-compresion sports bra) that is facing a rather smaller consumer pool but much higher expectation for quality and function.

Screen Shot 2016-12-26 at 3.08.12 pm.png

As a sports lover and curious consumer (also, aspiring marketing professional), I always appreciate all kinds of marketing practices in the sports sector. I grew up feeling “bothered” by the grammar of that “Impossible is Nothing” (Adidas). I have dreamt of getting one of those rare model Jordans. And as I watch Kanye West stir up annual uproars wave after wave, I cannot help but ask, who/ what on earth is behind all this??!

Have You Met… LDA!

In my senior year of college, I was lucky enough to conduct an independent research project under the guidance of my advisor, Professor Lynne Butler at Haverford College, to explore one of the coolest statistical models in digital humanities – Latent Dirichlet Allocation (LDA).

A lot of people asked me how the project was motivated, but things actually fell into place naturally.

Lynne briefly introduced LDA to all her potential advisees before the official advisor assignment and expressed her interest in this model – “suppose you are given a corpus of documents, LDA can help you figure out the topics that each document in that corpus was related to”. Those might not be her exact words, but I was sold immediately and thought of maybe applying LDA to corpora in French, since I’ve always enjoyed French literature. So when Lynne brought up to me the idea of comparing the results of LDA on a French corpus and its English counterpart, I couldn’t wait to put the great idea into words for my thesis topic proposal submitted to the department – for I already envisioned how I would enjoy doing the project. And next thing I knew, I started my senior year with my dream advisor and a dreamy thesis topic, living the dream of doing a research that magically connects both statistics and French literature.

More than one year after I submitted my 35-page thesis, I still feel the need to start a blog to relive the unforgettable experience or even take this opportunity to extend the past exploration with my growing knowledge in statistics, while I’m pursuing my master’s degree at Columbia.

While I do believe every piece of information on the Internet is made to  target certain audience, I also wish whoever wander and end up here can find a small piece of information useful.