COMS W4705: About |
|
|
|
Instructor:
Michael Collins
Course Description:
COMS W4705 is a graduate introduction to natural language processing, the
study of human language from a computational perspective. We will
cover syntactic, semantic and discourse processing models. The
emphasis will be on machine learning or corpus-based methods and
algorithms. We will describe the use of these methods and models in
applications including syntactic parsing, information extraction,
statistical machine translation, dialogue systems, and summarization.
Problem sets:
There were will be 4 problem sets during the class, due roughly every
three weeks. The problem sets will include both theoretical problems and
programming assignments.
Exams:
There will be two mid-terms and a final in the class.
The mid-terms will be in class, on October 4th and November 1st.
Grading:
The overall grade will be determined roughly as follows:
Midterms 30%, Final 40%, Problem sets 30%.
Syllabus:
Here is a tentative syllabus for class:
- Introduction (1 lecture)
- Estimation techniques, and language modeling (1 lecture)
- Tagging, hidden Markov models (2 lectures)
- Statistical parsing (4 lectures)
- Log-linear models (2 lectures)
- Natural language generation (1 lecture)
- Machine translation (4 lectures)
- Tree adjoining grammars, CCG (2 lectures)
- Compositional semantics (2 lectures)
- Conditional random fields, global linear models (2 lectures)
- Word clustering (1 lecture)
- Word sense disambiguation (1 lecture)
Readings:
Course readings will be available either on the web or in-class
handouts. There is no textbook for the class, but
Jurafsky and Martin,
Speech and Language Processing, 2nd Edition, will provide useful
background for several of the lectures.