COMS W4705: About 



Instructor:
Michael Collins
Course Description:
COMS W4705 is a graduate introduction to natural language processing, the
study of human language from a computational perspective. We will
cover syntactic, semantic and discourse processing models. The
emphasis will be on machine learning or corpusbased methods and
algorithms. We will describe the use of these methods and models in
applications including syntactic parsing, information extraction,
statistical machine translation, dialogue systems, and summarization.
Problem sets:
There were will be 4 problem sets during the class, due roughly every
three weeks. The problem sets will include both theoretical problems and
programming assignments.
Exams:
There will be a midterm and a final in the class.
The midterm will be in class in mid October.
Grading:
The overall grade will be determined roughly as follows:
Midterm 25%, Final 40%, Problem sets 35%.
Syllabus:
Here is a tentative syllabus for class:
 Introduction (1 lecture)
 Estimation techniques, and language modeling (1 lecture)
 Tagging, hidden Markov models (2 lectures)
 Statistical parsing (4 lectures)
 Loglinear models (2 lectures)
 Natural language generation (1 lecture)
 Machine translation (4 lectures)
 Tree adjoining grammars, CCG (2 lectures)
 Compositional semantics (2 lectures)
 Conditional random fields, global linear models (2 lectures)
 Word clustering (1 lecture)
 Word sense disambiguation (1 lecture)
Readings:
Course readings will be available either on the web or inclass
handouts. There is no textbook for the class, but
Jurafsky and Martin,
Speech and Language Processing, 2nd Edition, will provide useful
background for several of the lectures.