COMS W4705: Natural Language Processing

[Main] | [General Information] | [Problem Sets]



Instructor: Michael Collins
Time & Location: Tues & Thurs 4.10-5.25, 535 Mudd
Office Hours:TBD

TAs: Please send all question to nlpfall2013.columbia at gmail dot com
Hyungtae Kim [hk2561] (OH Thursday 1.00-2.30 PM, in Mudd TA Room)
Mohammad Sadegh Rasooli [mr3254] (OH Tuesday 1.45-3.15 PM, in 7LE5 CEPSR NLP Lab)
Victor Soto [vs2411] (OH Monday 4.00-5.30 PM, in 7LE5 CEPSR NLP Lab)
Yanting Zhao [yz2487] (OH Wednesday 4.00-5.30 PM, in 7LE5 CEPSR NLP Lab)

Alexander Rush [srush@csail.mit.edu] (OH Thursday 1.00-2.00 PM, in 701 CEPSR "Aquarium")

Announcements:

The mid-term for the class is on October 15th, in class. It is closed book, but you may take one letter-sized page of notes to the exam (you can use both sides of the page). The mid-term will cover all material up to including the lecture on 9/26 (i.e., everything up to the end of the parsing lectures). Past mid-terms are here: fall 2011, fall 2012.

This year we will be using Piazza to have open discussions on the lectures and homeworks. Please sign up here.

A substantial portion of this class was offered on Coursera in Spring 2013. You may want to sign up at Coursera so that you can view video lectures for the topics in this class at this link. The video lectures will follow the content of the class very closely.


Lectures:


Date Topic References
9/3 Introduction and Overview
9/5 Language Modeling Notes on language modeling (required reading)
9/10 Tagging, and Hidden Markov Models Notes on HMMs (required reading)
9/12 Tagging, and Hidden Markov Models (continued)
9/17 Parsing, and Context-free Grammars Note on PCFGs (required reading)
9/19 Parsing, context-free grammars, and probabilistic CFGs (continued)
9/24 Weaknesses of PCFGs, Lexicalized probabilistic CFGs Note on Lexicalized PCFGs (required reading)
9/26 Models 1 and 2 from Collins, 1999 (additional material on lexicalized PCFGs)
10/1 Guest lecture by Nizar Habash
10/3 Machine translation part 1, Machine translation part 2 Note on IBM Models 1 and 2 (required reading)
10/8 Machine translation part 2, continued.
Machine translation evaluation
10/10 Phrase-based translation models Note on phrase-based models (required reading)
Slides from the tutorial by Philipp Koehn
10/15 Midterm (in class)
10/17 Phrase-based translation models: the decoding algorithm,
Reordering for statistical MT
10/22 Log-linear models Note on log-linear models (required reading).
10/24 Guest lecture: Joint Decoding Tutorial on dual decomposition
10/31 Log-linear tagging (MEMMs)
11/7 Global linear models Note on MEMMs and CRFs.
11/12 Global linear models part II
11/14 Global linear models part III
11/19 The Brown word-clustering algorithm
11/21 Semi-supervised learning for word-sense disambiguation, and cotraining for named-entity detection
11/21 Guest Lecture: WordsEye
12/3 The EM algorithm for Naive Bayes Notes on the EM algorithm for Naive Bayes (Sections 4 and 6 provide useful technical background, but can be safely skipped.)
12/5 Review of Models Short Answer Questions