General details

This class is a graduate seminar for students who are interested in expanding their knowledge about the use of Bayesian statistics in natural language processing.

The class consists of two parts: the first part consists of lectures given by the instructor about various topics in the area of Bayesian NLP. The second part consists of discussions, led by participants in the class (each week by different participants) about papers in the area of Bayesian NLP.

Grades will be based on the discussions led in class, participation and possibly a white paper (a short paper that students have to submit, in which participants summarize one or two related papers, and suggest some future directions for exploration).

In order to make the most out of this class, participants are expected to come with some basic knowledge of probability and statistics, and basic familiarity with NLP research.

If you have any questions about the seminar, email me at scohen [strudel] cs.columbia.edu.

Lecture topics

The following are some topics to be covered during the lectures:

General introduction to the Bayesian approach to statistics
Prior distributions used in NLP
Bayesian inference (variational methods, sampling methods, etc.)
Principal Bayesian NLP models

To get a feeling about some of the core material in this area, check out Sharon Goldwater's Bayesian language modeling reading list. A lot of progress has been made in this area since this list was last updated, but it presents some of the basic papers and reading material that participants in the class could choose to discuss (of course, students could choose newer papers to present). In the first week of the class, we will compile a more thorough list of papers to choose from. A current seed version of the list exists here, as a PDF file.

News

5/6 - last class this semester. Thank you everybody for the hard work!
3/10 - we have a tentative schedule for paper discussions until 5/6. If you did not receive an email from me about presenting a paper, and would like to do so, email me as soon as possible.
2/12 - We have a tentative schedule for paper discussions until 4/1.
2/3 - We compiled a list of Bayesian NLP papers. The list can be found here. Next week we will start building a schedule for group discussions.
2/1 - The class now has a mailing list, coms-e6998-11@lists.cs.columbia.edu - you should subscribe to it if you attend the class.

Tentative schedule and past classes

Date	Lecturer	Topics	Notes	Reading material
1/28	Shay	Basic refresher on Probability and Statistics (statistical independence, conditional independence, Bayes' theorem), the Bayesian approach, hypothesis testing, priors in general, Bayesian updating	Slides (most material was presented on the blackboard)	none
2/4	Shay	Priors, PCFGs, multinomials, conjugacy, Dirichlet distributions, Bayesian point estimate; QA session about the material read	Slides (most material was presented on the blackboard)	Chapter 2; Optional: Chapter 1
2/11	Bob Carpenter (guest lecture)	Bayesian domain adaptation	Stan Modeling Language Reference Manual Bob's blog post about Bayesian inference	Jenny Rose Finkel and Christopher D. Manning (2009). Hierarchical Bayesian Domain Adaptation. Proceedings of NAACL. [pdf] Optional reading (for data motivation): John Blitzer, Mark Dredze and Fernando Pereira (2007). Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. Proceedings of ACL. [pdf]
2/18	Rachel, Armineh	Bayesian estimation, basic comparison of inference methods	Slides Armineh's notes for Gao and Johnson (2008)	Chapter 3 Jianfeng Gao and Mark Johnson (2008). A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers. Proceedings of EMNLP. [pdf]
2/25	Shay, Joe	Variational inference, unsupervised POS tagging	Slides (most material was presented on the blackboard)	Sharon Goldwater and Thomas L. Griffiths (2007). A fully Bayesian approach to unsupervised part-of-speech tagging. Proceedings of ACL. [pdf] Sujith Ravi and Kevin Knight (2011). Deciphering Foreign Language. Proceedings of ACL. [pdf]
3/4	Yu, Kyle, Kevin	Decipherment, unsupervised POS tagging (cont'd), inference with PCFGs	Yu's slides about decipherment Kevin's slides about Bayesian PCFG inference	Kristina Toutanova and Mark Johnson (2007). A Bayesian LDA-based model for semi-supervised part-of-speech tagging. Proceedings of NIPS. [pdf] Mark Johnson and Thomas Griffiths and Sharon Goldwater (2007). Bayesian inference for PCFGs via Markov chain Monte Carlo. Proceedings of NAACL. [pdf]
3/11	Shay, Daniel, Karl	Variational inference (cont'd), Bayesian logic programs, semantic parsing	the material was presented on the blackboard	Sindhu Raghavan, Raymond J. Mooney and Hyeonseo Ku (2012). Learning to "read between the lines" using Bayesian logic programs. Proceedings of ACL. [pdf] (note that this paper is not strictly a "Bayesian" paper in the traditional sense, but it is an interesting paper to know about, nevertheless, and there was a demand for it. Food for thought: how would we turn Bayesian logic programs into Bayesian in the full sense of the word?) Bevan Jones, Mark Johnson and Sharon Goldwater (2012). Semantic parsing with Bayesian tree transducers. Proceedings of ACL. [pdf]
3/25	Shay, Anahita, Anup	Variational inference (cont'd), GMMs, grammar induction	all material was presented on the blackboard	Stephen J. Roberts, Dirk Husmeier, William Penny and lead Rezek (1998). Bayesian Approaches to Gaussian Mixture Modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence [pdf] Kurihara and Sato (2006). Variational Bayesian grammar induction for natural language. International Colloquium on Grammatical Inference. [pdf]
4/1	Shay, Chris, Swabha	Basics of sampling, semantic role induction, finite-state transducers	Chris's slides about semantic role induction	Titov and Klementiev (2012). A Bayesian Approach to Unsupervised Semantic Role Induction. Proceedings of EACL. [pdf] Chiang et al. (2010). Bayesian Inference for Finite-State Transducers. Proceedings of NAACL. [pdf]
4/8	Michael, Jessica, Krutika	Language modeling, sentiment mining	Michael's slides on Goldwater and Johnson Jessica's slides on Teh's paper	Goldwater and Johnson (2004). Priors in Bayesian Learning of Phonological Rules. Proceedings of ACL SIG in Computational Phonology. [pdf] Teh (2006). A hierarchical Bayesian language model based on Pitman–Yor processes. In Proceedings of ACL. [pdf] Davies and Ghahramani (2011). Language-independent bayesian sentiment mining of twitter. In The Fifth Workshop on Social Network Mining and Analysis. [pdf]
4/15	Shay	MCMC sampling	all material was presented on the blackboard	no reading material for this week
4/22	Arvind, Shay	machine translation, topic modeling, MCMC sampling	material was presented on the blackboard	Paul et al. (2011). Dialect Translation: Integrating Bayesian Co-segmentation Models with Pivot-based SMT. Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties [pdf] Blunsom et al. (2009). A Gibbs Sampler for Phrasal Synchronous Grammar Induction. Proceedings of ACL. [pdf] Wallach, Mimno and McCallum (2009). Rethinking LDA: Why Priors Matter. Proceedings of NIPS. [pdf]
4/29	Kaili, Yi-Chen, Mohammad	translation, summarization, adaptor grammars	Yi-Chen's slides on Daume and Marcu	John Denero , Alexandre Bouchard-côté , Dan Klein (2008). Sampling alignment structure under a Bayesian translation model. Proceedings of EMNLP [pdf] Daumé III, Hal, and Daniel Marcu (2006). Bayesian query-focused summarization. Proceedings of ACL. [pdf] Mark Johnson, Thomas L. Griffiths and Sharon Goldwater (2007). Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models. Proceedings of NIPS. [pdf]
5/6	Wael, Shay	Bayesian nonparametrics, summary, history of Bayes rule	Wael's slides The theory that would not die, by Sharon Bertsch McGrayne	no reading material