COMS E6998-3: Machine Learning for Natural Language Processing

COMS E6998-3: Machine Learning for Natural Language Processing (Spring 2011)

Lecturer: Prof. Michael Collins (Office hours Wednesdays, 2-3pm, room 723 CEPSR)

TA: Vinod Prabhakaran (Office hours Fridays, 2-4pm, room 7LE5 (NLP Lab), CEPSR)

Lectures: Wednesdays 4.10-6.00pm, in Mudd 1024

Course description:

This is an advanced course in machine learning for natural language processing. The methods we will cover will be relevant to many NLP applications, for example machine translation, dialog systems, natural language parsing, and information extraction. The course will cover the following topics:

Models for structured prediction: e.g., hidden Markov models, maximum-entropy Markov models, conditional random fields, probabilistic context-free grammars, synchronous context-free grammars, dependency parsing models, max-margin methods for structured prediction.
Inference algorithms: e.g., dynamic programming algorithms, belief propagation, methods based on linear programming and integer linear programming, methods based on dual decomposition and Lagrangian relaxation.
Semi-supervised learning methods: e.g., methods that derive lexical representations from unlabeled data, cotraining algorithms, methods based on canonical correlation analysis (CCA).

Prerequisites:

A graduate-level course in machine learning (e.g., COMS W4771) or natural language processing (e.g., COMS W4705).

Text/material:

There is no textbook for the course. Throughout the course, we will make use of research papers as readings. The following book may provide useful background, particularly for the early part of the course:

Daniel Jurafsky and James H. Martin. Speech and Language Processing (2nd Edition). Pearson: Prentice Hall. 2009.

Assignments:

There will be 3 homeworks (25% of the final grade), and a final project (65% of the final grade). 10% of the final grade will depend on class participation.