COMS E6998: Machine Learning for Natural Language Processing (Spring 2012)




Problem sets

Lecturer: Prof. Michael Collins (Office hours Thursdays, 1-2pm, room 723 CEPSR)


  • Yin-Wen Chang, (office hours Tues 1-2, Weds 11-12, in CEPSR 701).
  • Karl Stratos, (office hours Mon 1.30-2.30, Fri 1.30-2.30, in CEPSR 721).

    Lectures: Wednesdays 4.10-6.00pm, in Hamilton 503


    • Students will need a solid background in: (1) algorithms (e.g., a prior class at the level of this class); (2) probability (e.g., this textbook is highly recommended; chapters 1 and 2 should be good background for this course).

    • A prior class in machine learning and/or natural language processing is recommended. (In particular, if you're interested in a first class in NLP, then COMS 4705, taught in the fall, may be more appropriate.)

    Course description:

    This is an advanced course in machine learning for natural language processing. The methods we will cover will be relevant to many NLP applications, for example machine translation, dialog systems, natural language parsing, and information extraction. The course will cover the following topics:

    • Models for structured prediction: e.g., hidden Markov models, maximum-entropy Markov models, conditional random fields, probabilistic context-free grammars, synchronous context-free grammars, dependency parsing models, max-margin methods for structured prediction.

    • Unsupervised and semi-supervised learning methods: e.g., the EM algorithm, methods that derive lexical representations from unlabeled data, cotraining algorithms, methods based on canonical correlation analysis (CCA).

    • Inference algorithms: e.g., dynamic programming algorithms, belief propagation, methods based on linear programming and integer linear programming, methods based on dual decomposition and Lagrangian relaxation.


    There is no textbook for the course. Throughout the course, we will make use of research papers as readings. The following book may provide useful background, particularly for the early part of the course:


    There will be 3 homeworks (30% of the final grade), a final class project (40% of the final grade), one 2 hour exam (30% of the final grade, in class, date TBD).