6.864: Natural Language Processing

Gilbert Street
[Main] | [General Information] | [Problem Sets]

Instructor: Michael Collins
Time & Location: Tues & Thurs 1-2.30, 32-144
Office Hours: By appointment

TA: Igor Malioutov, igorm AT csail.mit.edu

Date Topic References
9/6 Part 1: Introduction and Overview

Part 2: Language Modeling
Here are some references on language modeling.
9/11 Parsing and Syntax 1
9/18 Parsing and Syntax 2 Chapter 14 (draft) of Jurafsky and Martin is available here. It covers a lot of the material from Parsing lectures 1, 2, and 3.

As additional reading, the Charniak (1997) paper is here. A journal paper describing the Collins (1997) model is here.

9/20 Parsing and Syntax 3
9/25 Log-linear models
9/27 Tagging

Background Reading: Daniel M. Bikel, Richard Schwartz and Ralph M. Weischedel. 1999. An Algorithm that Learns What's in a Name. In Machine Learning, Special Issue on Natural Language Learning.

Background Reading: Andrew McCallum, Dayne Freitag and Fernando Pereira. Maximum Entropy Markov Models for Information Extraction and Segmentation. In proceedings of ICML 2000.

Background Reading: Adwait Ratnaparkhi. A Linear Observed Time Statistical Parser Based on Maximum Entropy Models In proceedings of EMNLP 1997.

10/2 The EM algorithm, part 1
10/4 The EM algorithm, part 2 Required reading for today's lecture is here. You can read this either before or after the class.

Here's a brief note clarifying some of the identities in section 5.2.

10/11 Machine Translation, Part 1 Jurafsky and Martin, Chapter 25 sections 25.1-25.3, and 25.9, cover much of the material covered in lecture.

If you're interested in reading more about the Bleu evaluation measure, the original paper is here. Another interesting paper on Bleu scores is here.

10/16 Machine Translation, Part 2 Jurafsky and Martin, Chapter 25 sections 25.5.1 and 25.6.1 cover IBM Model 1.

The original IBM paper is here. This is definitely not required reading, but you might find it interesting.

Here is a very cool article on statistical MT, by Kevin Knight -- again, not required reading but you might find it helpful.

10/18 Machine Translation, Part 3 Here are the slides from Philipp Koehn's tutorial. We covered slides 103-108 (extracting phrases) and slides 29-57 (decoding) in lecture.

Jurafsky and Martin, Chapter 25 section 25.4 covers phrase-based models. Section 25.8 covers decoding phrase-based models. Figure 25.30, which shows the decoding algorithm, is particularly important.

10/25 Machine Translation, Part 4

Slides on reordering approaches

10/30 Machine Translation, Part 5 Background reading: The paper by David Chiang is here. Sections 1-3 are most relevant for what we covered in class.

Additional reading: This paper might be of interest if you're interested in how s-CFG approaches can make use of treebank parses.

11/1 Global Linear Models, Part 1
11/6 Global Linear Models, Part 2 (Note: we didn't cover slides 30-36.)

This paper describes the perceptron algorithm for tagging.

11/13 Guest lecture by Jim Glass: Speech Recognition
11/15 Similarity Measures and Clustering
11/20 Computational models of discourse Note: slides 1-42 were covered in the lecture.
11/27 Word-sense disambiguation The paper by David Yarowsky on semi-supervised methods is here.

The paper on named-entity classification is here.

11/29 Global Linear Models, Part 3: Dependency Parsing Chapter 3 of Ryan Mcdonald's thesis has an explanation of the dynamic programming algorithm for dependency parsing.
12/6, 12/11 Learning of CCGs