COMS W4705: Natural Language Processing




Instructor:
Michael Collins
Time & Location:
Tues & Thurs 4.105.25, 535 Mudd
Office Hours:TBD
TAs:
Please send all question to nlpfall2013.columbia at gmail dot com
Hyungtae Kim [hk2561] (OH Thursday 1.002.30 PM, in Mudd TA Room)
Mohammad Sadegh Rasooli [mr3254] (OH Tuesday 1.453.15 PM, in 7LE5 CEPSR NLP Lab)
Victor Soto [vs2411] (OH Monday 4.005.30 PM, in 7LE5 CEPSR NLP Lab)
Yanting Zhao [yz2487] (OH Wednesday 4.005.30 PM, in 7LE5 CEPSR NLP Lab)
Alexander Rush [srush@csail.mit.edu] (OH Thursday 1.002.00 PM, in 701 CEPSR "Aquarium")
Announcements:
The midterm for the class is on October 15th, in class. It is closed book,
but you may take one lettersized page of notes to the exam (you can use both
sides of the page). The midterm will cover all material up to including the lecture on
9/26 (i.e., everything up to the end of the parsing lectures).
Past midterms are here:
fall 2011,
fall 2012.
This year we will be using Piazza to have open discussions on the lectures and homeworks. Please sign up here.
A substantial portion of this class was offered on Coursera in Spring 2013.
You may want to sign up at Coursera so that
you can view video lectures for the topics in this class at
this link. The video lectures will follow the content of the class
very closely.
Lectures:
Date 
Topic 
References 
9/3 
Introduction and Overview

9/5 
Language Modeling 
Notes on language modeling (required reading)

9/10 
Tagging, and Hidden Markov Models
 Notes on HMMs
(required reading)

9/12 
Tagging, and Hidden Markov Models (continued)


9/17 
Parsing, and Contextfree Grammars

Note on PCFGs (required reading)

9/19 
Parsing,
contextfree grammars, and probabilistic CFGs (continued)


9/24 
Weaknesses of PCFGs,
Lexicalized
probabilistic CFGs

Note on Lexicalized PCFGs (required reading)

9/26 
Models 1 and 2 from Collins, 1999
(additional material on lexicalized PCFGs)


10/1 
Guest lecture
by Nizar Habash


10/3 
Machine translation part 1,
Machine translation part 2

Note on IBM Models 1 and 2 (required reading)

10/8 
Machine translation part 2, continued.
Machine translation evaluation


10/10 
Phrasebased translation models

Note on phrasebased models (required reading)
Slides from the tutorial by Philipp Koehn

10/15 
Midterm (in class)


10/17 
Phrasebased translation models: the decoding algorithm,
Reordering for statistical MT


10/22 
Loglinear models

Note on loglinear models (required reading).

10/24 
Guest lecture: Joint Decoding
 Tutorial on dual decomposition 
10/31 
Loglinear tagging (MEMMs)


11/7 
Global linear models
 Note on MEMMs and CRFs.

11/12 
Global linear models part II


11/14 
Global linear models part III


11/19 
The Brown wordclustering algorithm


11/21 
Semisupervised learning for
wordsense disambiguation,
and
cotraining for namedentity detection 

11/21 
Guest Lecture: WordsEye 

12/3 
The EM algorithm for Naive Bayes

Notes on the EM algorithm for Naive
Bayes (Sections 4 and 6 provide useful technical background, but
can
be safely skipped.)

12/5 
Review of Models
 Short Answer Questions
