Probabilistic Models of Discrete Data

Spring 2016, Columbia University

David M. Blei

Day/Time: Fridays, 12:10PM-2:30PM
Location: Mudd 633

Course Description

We will study probabilistic models of discrete data, especially focusing on large-scale data sets that are high-dimensional and sparse. Discrete data sets are found in diverse applications of statistical machine learning, such as natural language processing, recommendation systems, computational neuroscience, and statistical genetics. Topics will include embeddings, mixed-membership models (topic models), scalable computation, Bayesian nonparametrics, and model diagnosis. During the course of the semester, each student will be expected to complete an ambitious project around a real-world problem.

Prerequisites. The prerequisite course is Foundations of Graphical Models and you should be comfortable with its material. Specifically, you should be able to write down a new model where each complete conditional is in the exponential family, derive and implement an approximate inference algorithm for the model, and understand how to interpret its results. You should also be fluent in the semantics of graphical models. Finally, note this is a seminar. It is only open to PhD students. Auditors are not permitted.

Reading assignments and notes

Introduction and logistics
- Notes (Class logistics, requirements, etc.)
Word embeddings I
- Reading: Bengio et al., 2003
- Scribe notes (Olivia Winn)
Word embeddings II
- Reading: Mikolov et al., 2014
- Reading: Pennington et al., 2014
- Reading: Arora et al., 2015
- Scribe notes (Jalaj Bhandari)
Word embeddings III
- Reading: Arora et al., 2016
- Reading: Levy and Goldberg, 2014
- Reading: Le and Mikolov, 2014
- Scribe notes (Ido Rosen)
Factorization I
- Reading: Bhattacharya and Dunson, 2012
- Reading: Zhou et al., 2014
- Scribe notes (Maja Rudolph)
Aside: Stochastic optimization and variational inference
- Reading: Hoffman et al., 2013 (pp 1303-1327)
- Reading: Ranganath et al., 2014
- Optional reading: Blei et al., 2016 (Errata)
- Optional reading: Kingma and Welling, 2013
- Optional reading: Mohamed, 2015
- Scribe notes (Ryan Dew)
Inverse regression for text
- Reading: Taddy, 2015
- Reading: Taddy, 2013
- Optional reading: Rabinovich and Blei, 2014 (Appendix)
- Scribe notes (Tom Blazejewski)
Bayesian nonparametrics
- Reading (introductory): Blei and Gershman, 2013 (Sections 1,2,3)
- Reading (introductory): Neal, 2000 (Sections 1,2,3)
- Reading (advanced): Jordan, 2013 (Sections 1, 2)
- Reading (advanced): Teh and Jordan, 2010 (Sections 1, 2, 5)
- Lecture notes from FOGM
- Scribe notes (I) (Kui Tang)
- Scribe notes (II) (Benjamin Bloem-Reddy)
More Bayesian nonparametrics
- Reading: Zhou and Carin, 2015
- Scribe notes (Da Tang)
Discrete choice models
- Reading: Braun and McAuliffe, 2010
- Reading: Athey and Imbens, 2007
- Reading: Manchanda et al., 1999
- Reading: Rossi et al., 1996
Statistical analysis of networks
- Reading: Hoff et al., 2002
- Reading: Raftery et al., 2012
- Reading: Hoff, 2014