COMS W4705: About

[Main] | [General Information] | [Problem Sets]

Instructor: Michael Collins

Course Description:

COMS W4705 is a graduate introduction to natural language processing, the study of human language from a computational perspective. We will cover syntactic, semantic and discourse processing models. The emphasis will be on machine learning or corpus-based methods and algorithms. We will describe the use of these methods and models in applications including syntactic parsing, information extraction, statistical machine translation, dialogue systems, and summarization.

Problem sets:

There were will be 4 problem sets during the class, due roughly every three weeks. The problem sets will include both theoretical problems and programming assignments.

Exams:

There will be two mid-terms and a final in the class. The mid-terms will be in class, on October 4th and November 1st.

Grading:

The overall grade will be determined roughly as follows: Midterms 30%, Final 40%, Problem sets 30%.

Syllabus:

Here is a tentative syllabus for class:

Introduction (1 lecture)
Estimation techniques, and language modeling (1 lecture)
Tagging, hidden Markov models (2 lectures)
Statistical parsing (4 lectures)
Log-linear models (2 lectures)
Natural language generation (1 lecture)
Machine translation (4 lectures)
Tree adjoining grammars, CCG (2 lectures)
Compositional semantics (2 lectures)
Conditional random fields, global linear models (2 lectures)
Word clustering (1 lecture)
Word sense disambiguation (1 lecture)

Readings:

Course readings will be available either on the web or in-class handouts. There is no textbook for the class, but Jurafsky and Martin, Speech and Language Processing, 2nd Edition, will provide useful background for several of the lectures.