CS 4705: Introduction to Natural Language Processing, Fall 2007

Time:

TTh: 2:40-3:55

Place

 327 Mudd

Professor: 

Julia Hirschberg

Office Hours: 

Tu 4-5,We 10-11

Email: 

julia@cs.columbia.edu

Phone: 

212-939-7114

Teaching Assistant: 

Frank Enos

Office Hours: 

Tu/Th 1-2

Email:

frank@cs.columbia.edu

Phone: 

212-939-7193

Announcements || Academic Integrity ||  Contributions || Description
Links to Resources ||
Requirements || Syllabus || Text

Announcements:

  1. Check Columbia Courseworks for announcements, your grades (only you will see them), and discussion. Professor Hirschberg and your TA will monitor the discussion lists to answer questions.
  2. If you are interested in doing NLP research projects for credit, please let Professor Hirschberg know. The NLP group often has research opportunities available.  Other postings may be found at this location.

Description:

This course provides an introduction to the field of computational linguistics, aka natural language processing (NLP). We will learn how to create systems that can understand and produce language, for applications such as information extraction, machine translation, automatic summarization, question-answering, and interactive dialogue systems. The course will cover linguistic (knowledge-based) and statistical approaches to language processing in the three major subfields of NLP: syntax (language structures), semantics (language meaning), and pragmatics/discourse (the interpretation of language in context). Homework assignments will reflect research problems computational linguists currently work on, including analyzing and extracting information from large online corpora.

Textbook:

Speech and Language Processing by Jurafsky and Martin. It will be available from the University Bookstore, as well as from Amazon and other online providers. It should also be on reserve in the Engineering Library. Please check the online errata for the text for each chapter as you read it. 

Requirements:

Four homework assignments, a midterm and a final exam. Each student in the course is allowed a total of 4 late days on homeworks with no questions asked; after that, 10% per late day will be deducted from the homework grade, unless you have a note from your doctor.  Do not use these up early!  Save them for real emergencies. 

All students are required to have a Computer Science Account for this class. To sign up for one, go to the CRF website and then click on "Apply for an Account".

Homework submission procedure.

Academic Integrity:

Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is not allowed, and will result in an automatic grade of 0 for the entire assignment or exam in which the copying or paraphrasing was done. Your grade should reflect your own work. If you believe you are going to have trouble completing an assignment, please talk to the instructor or TA in advance of the due date.

 

Syllabus:

 

Week

Class

Topic

Reading

Assignments

1

Sep 4

Introduction and Course Overview

 

 

 

Sep 6

Natural Language and Formal Language: Regular Expressions and Finite State Automata

Ch 1-2

 

2

Sep 11

Words and Their Parts:  Morphology

Ch 3.1

 

 

Sep 13

Word Construction and Analysis: Morphological Parsing

Ch 3: 2-6

 

 HW1 assigned; text inputs file1 and file2

3

Sep 18

Word Tokenization, Pronunciation and Spelling

Ch 5:1-8

 

 

Sep 20

N-grams and Language Models

Ch 6

 

4

Sep 25

Word Classes and POS Tagging; Questions

Ch 8

 HW1 due

 

Sep 27

Machine Learning Approaches to NLP and Introduction to Weka

Jansche&Abney02

 

 HW2 assigned; FAQ

5

Oct 2

Context-Free Grammars

Ch 9

 Guest Speaker:  Owen Rambow

 

Oct 4

Parsing with Context Free Grammars

Ch 10

 

 6

Oct 9

Probabilistic and Lexicalized Parsing

Ch 12 (Be sure to replace figure 12.3 with new version )

 

 

Oct 11

Catching Up; Representing Meaning

Ch 14

 

 7

Oct 16

Semantic Analysis and Midterm Review

Ch  15:1,4-6

 

 

Oct 18

Midterm Examination

Sample midterm

 

 

Oct 23

Relations Among Words

Ch 16:1-2

 

 

Oct 25

WordsEye

Ch 16:3-5

 Guest Speaker: Robert Coyne

 8

Oct 30

Word Sense Disambiguation

Ch 17:1-2

 

 

Nov 1

Information Retrieval and Information Extraction

 Ch 17:3-5

 HW2 due (2pm); how to submit

10

Nov 6

Holiday

Holiday

 Holiday

 

Nov 8

Pronouns and Reference Resolution

Ch 18: 18.1

 HW3 assigned

 

Nov 13

Algorithms for Reference Resolution

 

 

 

Nov 15

Text Coherence and Discourse Structure

Ch 18.2-18.5; Grosz&Sidner86

 Guest Speaker: Frank Enos

 

Nov 20

Turn-taking and Grounding

Ch 19:1

 Gust Speaker: Agustín Gravano

 

Nov 22

Thanksgiving Holiday

 

 

13

Nov 27

Dialogue Systems

Ch19:2-6

 

 

Nov 29

Summarization and Generation

Ch 20

 HW3 due

14

Dec 4

Machine Translation

Ch 21

 Guest Speaker: Nizar Habash

 

Dec 6

Final Review

 

 

 

 

 

Dec. 11-13

 

 

 

 

Study Days

 

Dec. 14-21

 

 

Final Exams

Links to Resources (cf. also resources available from the text homepage):

General:

  1. Karen Chung Language and Linguistics links
  2. CatSpeak

Places to look up definitions and descriptions of terminology:

  1. Oxford Dictionary of Linguistics
  2. Interesting Language Factoids and Non

Chapters 1 and 2:

Try out one of the many versions of Eliza on the web.

Chapter3:

AT&T Labs - Research Finite State Machine Library

Later Chapters:

  1. Appelt and Israel's information extraction tutorial (IJCAI-99).
  2. Framenet.

Chapter 19:

  1. Ask Jeeves -- a search engine that answers questions in plain English.
  2. Answer Bus -- another Q/A system.
  3. Columbia's NewsBlaster summarizer
  4. IBM summarizer demo (canned)
  5. Systran machine translation (also in use at Babelfish)
  6. AT&T Labs - Research Finite State Machine Library
  7. Michael Collins' Parser
  8. On-line dictionaries in many languages.
  9. WordNet
  10. Framenet
  11. CoBuildDirect Corpus
  12. AT&T's SCANMail voicemail browsing/search system
  13. DiaLeague 2001 -- includes a link to an online dialogue system demo.
  14. James Allen's Dialogue Modeling for Spoken Language Systems ACL 1997 Tutorial
  15. Festival speech synthesizer demo and links to other TTS systems
  16. Julia Hirschberg's Intonational Variation in Spoken Dialogue Systems tutorial

Announcements || Academic Integrity || Contributions || Description
 Links to Resources|| Requirements || Syllabus || Text