CS 4705: Introduction to Natural Language Processing, Fall 2017

Course Information
Time TTh: 4:30-5:10 Place TBD Mudd
Professor Kathleen McKeown Office Hours Tu 5:30-6:30,We 4-5, 722 CEPSR
Email kathy@cs.columbia.edu Phone 212-939-7114
Teaching Assistant Fei-Tzin Lee fl2301@columbia.edu Office Hours TBD
Elsbeth Turcan ect2150@columbia.edu Office Hours TBD
Siddharth Varia sv2504@columbia.edu Office Hours TBD

Course Description

This course provides an introduction to the field of natural language processing (NLP). We will learn how to create systems that can analyze, understand and produce language. We will begin by discussing core NLP, such as language modeling, part of speech tagging and parsing. We will also discuss applications such as information extraction, machine translation, automatic summarization, and question-answering. The course will primarily cover statistical and machine learning based approaches to language processing, but it will also introduce the use of linguistic concepts that play a role. We will study machine learning methods currently used in NLP, including supervised machine learning, hidden markov models, and neural networks. Homework assignments will include both written components and programming assignments.

Requirements

Four homework assignments, a midterm and a final exam. Each student in the course is allowed a total of 4 late days on homeworks with no questions asked; after that, 10% per late day will be deducted from the homework grade, unless you have a note from your doctor. Do not use these up early! Save them for real emergencies.

We will use Google Cloud for the course. Stay tuned on how to sign up for course credits.

Textbook

Speech and Language Processing, 2nd Edition, by Jurafsky and Martin. It will be available from the University Bookstore, as well as from Amazon and other online providers. It should also be on reserve in the Engineering Library.

Neural Network Methods for Natural Language Processing by Yoav Goldberg. It is available online but you can also purchase hard copy from the publisher.

Syllabus

This syllabus is still subject to change. Readings may change. But it will give you a good idea of what we will cover.

Week Class Topic Reading Assignments
1 Sep 5 Introduction and Course Overview Ch 1, Speech and Language
Sep 7 Language modeling Ch 4, Speech and Language
2 Sep 12 Supervised machine learning, text classification Ch 2 Neural Nets, HW1: Republican or Democrat?
Sep 14 Supervised machine learning
3 Sep 19 Methods: Hidden Markov Modeling C 5.1-5.5, Speech and Language
Sep 21 POS tagging C 6.1-6.5 Speech and Language
4 Sep 26 Syntax and Grammars C 12 Speech and Language HW1 due
Sep 28 Parsing C. 13 Speech and Language HW2: Parsing
5 Oct 3 Dependency Parsing C 14.6 Speech and Language
Oct 5 Supervised learning of parsers, evaluation C 14, Speech and Language
6 Oct 10 Introduction to semantics C 17 Speech and Language
Oct 12 Lexical Semantics, Distributed semantics C 19, C 20.1-20.8 Speech and Language HW 2 due
7 Oct 17 Semantic role labeling C 20.9
Oct 19 Midterm
8 Oct 24 Neural nets C 3,4 Neural Nets
Oct 26 Neural nets C 5 Neural Nets
9 Oct 31 NN example: semantic similarity C 10,11 Neural Nets HW3: Neural Nets (written)
Nov 2 NN: RNNs and Sentiment Analysis C 14, C16.1 Neural Nets
10 Nov 9 Summarization C 23.3-23.8 Speech and Language HW3 due
11 Nov 14 Summarization: abstractive papers HW4: Summarization
Nov 16 Machine Translation C 25.1 - 25.9 Speech and Language
12 Nov 21 Machine Translation C 17 Neural Nets
Nov 23 Thanksgiving
13 Nov 28 NN: Image Captioning Show and Tell
Nov 30 Information extraction C 21.1 - 21.4 Speech and Language
14 Dec 5 Inference, entailment Papers HW 4 due
Dec 17 Poetry, dialog Papers

Announcements

Check Piazza for announcements, your grades (only you will see them), and discussion. All questions should be posted through Piazza instead of emailing Professor McKeown or the TAs. They will monitor the discussion lists to answer questions.

Academic Integrity

Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is not allowed, and will result in an automatic grade of 0 for the entire assignment or exam in which the copying or paraphrasing was done. Your grade should reflect your own work. If you believe you are going to have trouble completing an assignment, please talk to the instructor or TA in advance of the due date