
TA: Igor Malioutov, igorm AT csail.mit.edu
Date  Topic  References 
9/6  Part 1: Introduction and Overview
Part 2: Language Modeling  Here are some references on language modeling. 
9/11  Parsing and Syntax 1  
9/18  Parsing and Syntax 2  Chapter 14 (draft) of Jurafsky and Martin is available
here.
It covers a lot of the material from Parsing lectures 1, 2, and 3.
As additional reading, the Charniak (1997) paper is here. A journal paper describing the Collins (1997) model is here. 
9/20  Parsing and Syntax 3  
9/25  Loglinear models  
9/27  Tagging 
Background Reading: Daniel M. Bikel, Richard Schwartz and Ralph M. Weischedel. 1999. An Algorithm that Learns What's in a Name. In Machine Learning, Special Issue on Natural Language Learning. Background Reading: Andrew McCallum, Dayne Freitag and Fernando Pereira. Maximum Entropy Markov Models for Information Extraction and Segmentation. In proceedings of ICML 2000. Background Reading: Adwait Ratnaparkhi. A Linear Observed Time Statistical Parser Based on Maximum Entropy Models In proceedings of EMNLP 1997. 
10/2  The EM algorithm, part 1  
10/4  The EM algorithm, part 2  Required reading for today's lecture is
here. You can read this either before or after the class.
Here's a brief note clarifying some of the identities in section 5.2. 
10/11  Machine Translation, Part 1 
Jurafsky and Martin, Chapter 25
sections 25.125.3, and 25.9,
cover much of the material covered in lecture.
If you're interested in reading more about the Bleu evaluation measure, the original paper is here. Another interesting paper on Bleu scores is here. 
10/16  Machine Translation, Part 2 
Jurafsky and Martin, Chapter 25
sections 25.5.1 and 25.6.1 cover IBM Model 1.
The original IBM paper is here. This is definitely not required reading, but you might find it interesting. Here is a very cool article on statistical MT, by Kevin Knight  again, not required reading but you might find it helpful. 
10/18  Machine Translation, Part 3 
Here
are the slides from Philipp Koehn's tutorial.
We covered slides 103108 (extracting phrases) and
slides 2957 (decoding) in lecture.
Jurafsky and Martin, Chapter 25 section 25.4 covers phrasebased models. Section 25.8 covers decoding phrasebased models. Figure 25.30, which shows the decoding algorithm, is particularly important. 
10/25  Machine Translation, Part 4  
10/30  Machine Translation, Part 5 
Background reading:
The paper by David Chiang is
here. Sections 13 are most relevant for what we covered
in class.
Additional reading: This paper might be of interest if you're interested in how sCFG approaches can make use of treebank parses. 
11/1  Global Linear Models, Part 1  
11/6  Global Linear Models, Part 2 
(Note: we didn't cover slides 3036.)
This paper describes the perceptron algorithm for tagging. 
11/13  Guest lecture by Jim Glass: Speech Recognition  
11/15  Similarity Measures and Clustering  
11/20  Computational models of discourse  Note: slides 142 were covered in the lecture. 
11/27  Wordsense disambiguation 
The paper by David Yarowsky on semisupervised methods is
here.
The paper on namedentity classification is here. 
11/29  Global Linear Models, Part 3: Dependency Parsing  Chapter 3 of Ryan Mcdonald's thesis has an explanation of the dynamic programming algorithm for dependency parsing. 
12/6, 12/11  Learning of CCGs 