General details

This class is a graduate seminar for students who are interested in expanding their knowledge about the use of Bayesian statistics in natural language processing.

The class consists of two parts: the first part consists of lectures given by the instructor about various topics in the area of Bayesian NLP. The second part consists of discussions, led by participants in the class (each week by different participants) about papers in the area of Bayesian NLP.

Grades will be based on the discussions led in class, participation and possibly a white paper (a short paper that students have to submit, in which participants summarize one or two related papers, and suggest some future directions for exploration).

In order to make the most out of this class, participants are expected to come with some basic knowledge of probability and statistics, and basic familiarity with NLP research.


If you have any questions about the seminar, email me at scohen [strudel] cs.columbia.edu.

Lecture topics

The following are some topics to be covered during the lectures:


To get a feeling about some of the core material in this area, check out Sharon Goldwater's Bayesian language modeling reading list. A lot of progress has been made in this area since this list was last updated, but it presents some of the basic papers and reading material that participants in the class could choose to discuss (of course, students could choose newer papers to present). In the first week of the class, we will compile a more thorough list of papers to choose from. A current seed version of the list exists here, as a PDF file.

News

Tentative schedule and past classes

DateLecturerTopicsNotesReading material
1/28ShayBasic refresher on Probability and Statistics (statistical independence, conditional independence, Bayes' theorem), the Bayesian approach, hypothesis testing, priors in general, Bayesian updatingSlides (most material was presented on the blackboard)none
2/4ShayPriors, PCFGs, multinomials, conjugacy, Dirichlet distributions, Bayesian point estimate; QA session about the material readSlides (most material was presented on the blackboard)Chapter 2; Optional: Chapter 1
2/11Bob Carpenter (guest lecture)Bayesian domain adaptationStan Modeling Language Reference Manual

Bob's blog post about Bayesian inference
Jenny Rose Finkel and Christopher D. Manning (2009). Hierarchical Bayesian Domain Adaptation. Proceedings of NAACL. [pdf]

Optional reading (for data motivation): John Blitzer, Mark Dredze and Fernando Pereira (2007). Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. Proceedings of ACL. [pdf]
2/18Rachel, ArminehBayesian estimation, basic comparison of inference methodsSlides

Armineh's notes for Gao and Johnson (2008)
Chapter 3

Jianfeng Gao and Mark Johnson (2008). A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers. Proceedings of EMNLP. [pdf]
2/25Shay, JoeVariational inference, unsupervised POS taggingSlides (most material was presented on the blackboard) Sharon Goldwater and Thomas L. Griffiths (2007). A fully Bayesian approach to unsupervised part-of-speech tagging. Proceedings of ACL. [pdf]

Sujith Ravi and Kevin Knight (2011). Deciphering Foreign Language. Proceedings of ACL. [pdf]
3/4Yu, Kyle, KevinDecipherment, unsupervised POS tagging (cont'd), inference with PCFGsYu's slides about decipherment

Kevin's slides about Bayesian PCFG inference
Kristina Toutanova and Mark Johnson (2007). A Bayesian LDA-based model for semi-supervised part-of-speech tagging. Proceedings of NIPS. [pdf]

Mark Johnson and Thomas Griffiths and Sharon Goldwater (2007). Bayesian inference for PCFGs via Markov chain Monte Carlo. Proceedings of NAACL. [pdf]
3/11Shay, Daniel, KarlVariational inference (cont'd), Bayesian logic programs, semantic parsingthe material was presented on the blackboard Sindhu Raghavan, Raymond J. Mooney and Hyeonseo Ku (2012). Learning to "read between the lines" using Bayesian logic programs. Proceedings of ACL. [pdf] (note that this paper is not strictly a "Bayesian" paper in the traditional sense, but it is an interesting paper to know about, nevertheless, and there was a demand for it. Food for thought: how would we turn Bayesian logic programs into Bayesian in the full sense of the word?)

Bevan Jones, Mark Johnson and Sharon Goldwater (2012). Semantic parsing with Bayesian tree transducers. Proceedings of ACL. [pdf]
3/25Shay, Anahita, AnupVariational inference (cont'd), GMMs, grammar inductionall material was presented on the blackboard Stephen J. Roberts, Dirk Husmeier, William Penny and lead Rezek (1998). Bayesian Approaches to Gaussian Mixture Modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence [pdf]

Kurihara and Sato (2006). Variational Bayesian grammar induction for natural language. International Colloquium on Grammatical Inference. [pdf]
4/1Shay, Chris, SwabhaBasics of sampling, semantic role induction, finite-state transducersChris's slides about semantic role induction Titov and Klementiev (2012). A Bayesian Approach to Unsupervised Semantic Role Induction. Proceedings of EACL. [pdf]

Chiang et al. (2010). Bayesian Inference for Finite-State Transducers. Proceedings of NAACL. [pdf]
4/8Michael, Jessica, KrutikaLanguage modeling, sentiment miningMichael's slides on Goldwater and Johnson

Jessica's slides on Teh's paper
Goldwater and Johnson (2004). Priors in Bayesian Learning of Phonological Rules. Proceedings of ACL SIG in Computational Phonology. [pdf]

Teh (2006). A hierarchical Bayesian language model based on Pitman–Yor processes. In Proceedings of ACL. [pdf]

Davies and Ghahramani (2011). Language-independent bayesian sentiment mining of twitter. In The Fifth Workshop on Social Network Mining and Analysis. [pdf]

4/15ShayMCMC samplingall material was presented on the blackboard no reading material for this week
4/22Arvind, Shaymachine translation, topic modeling, MCMC samplingmaterial was presented on the blackboard Paul et al. (2011). Dialect Translation: Integrating Bayesian Co-segmentation Models with Pivot-based SMT. Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties [pdf]

Blunsom et al. (2009). A Gibbs Sampler for Phrasal Synchronous Grammar Induction. Proceedings of ACL. [pdf]

Wallach, Mimno and McCallum (2009). Rethinking LDA: Why Priors Matter. Proceedings of NIPS. [pdf]
4/29Kaili, Yi-Chen, Mohammadtranslation, summarization, adaptor grammarsYi-Chen's slides on Daume and Marcu John Denero , Alexandre Bouchard-côté , Dan Klein (2008). Sampling alignment structure under a Bayesian translation model. Proceedings of EMNLP [pdf]

Daumé III, Hal, and Daniel Marcu (2006). Bayesian query-focused summarization. Proceedings of ACL. [pdf]

Mark Johnson, Thomas L. Griffiths and Sharon Goldwater (2007). Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models. Proceedings of NIPS. [pdf]
5/6Wael, ShayBayesian nonparametrics, summary, history of Bayes ruleWael's slides

The theory that would not die, by Sharon Bertsch McGrayne
no reading material