Michael Collins, mcollins AT csail.mit.edu
Time & Location:
Tues & Thurs 1-2.30, 32-144
Igor Malioutov, igorm AT csail.mit.edu
6.864 is a graduate introduction to natural language processing, the
study of human language from a computational perspective. We will
cover syntactic, semantic and discourse processing models. The
emphasis will be on machine learning or corpus-based methods and
algorithms. We will describe the use of these methods and models in
applications including syntactic parsing, information extraction,
statistical machine translation, dialogue systems, and summarization.
This subject qualifies as an Artificial Intelligence and Applications
There were will be 4 problem sets during the class, due roughly every
two weeks. The problem sets will include both theoretical problems and
some programming assignments.
There will be a mid-term and a final in the class.
There will be a final project for the class, more details to follow.
The overall grade will be determined roughly as follows:
Midterm 20%, Final 30%, Problem sets 25%, Final 25%.
Here is a tentative syllabus for class:
- Introduction (1 lecture)
- Estimation techniques, and language modeling (1 lecture)
- Parsing and Syntax (4 lectures)
- Log-linear models (1 lecture)
- Stochastic tagging (1 lecture)
- History-based models (1 lecture)
- The EM algorithm in NLP (2 lectures)
- Machine Translation (3 lectures)
- Global linear models (2 lectures)
- Discourse Processing: segmentation, anaphora resolution (2 lectures)
- Probabilistic similarity measures and clustering (1 lecture)
- Word-sense disambiguation (1 lecture)
- Information extraction (1 lecture)
- Unsupervised/semi-supervised learning in NLP (1 lecture)
- Tree-adjoining grammar, combinatory categorial grammars (2 lectures)
Course readings will be available either on the web or in-class