CS 4705: Introduction to Natural Language Processing, Fall 2007

Time:

TTh: 2:40-3:55

Place

 

Professor: 

Julia Hirschberg

Office Hours: 

TBA

Email: 

julia@cs.columbia.edu

Phone: 

212-939-7114

Teaching Assistant: 

Frank Enos

Office Hours: 

TBA

Email:

frankl@cs.columbia.edu

Phone: 

212-939-

Announcements || Academic Integrity ||  Contributions || Description
Links to Resources ||
Requirements || Syllabus || Text

Announcements:

Description:

This course provides an introduction to the field of computational linguistics, aka natural language processing (NLP). We will learn how to create systems that can understand and produce language, for applications such as information extraction, machine translation, automatic summarization, question-answering, and interactive dialogue systems. The course will cover linguistic (knowledge-based) and statistical approaches to language processing in the three major subfields of NLP: syntax (language structures), semantics (language meaning), and pragmatics/discourse (the interpretation of language in context). Homework assignments will reflect research problems computational linguists currently work on, including analyzing and extracting information from large online corpora.

Textbook:

Speech and Language Processing by Jurafsky and Martin. It will be available from the University Bookstore, as well as from Amazon and other online providers. It should also be on reserve in the Engineering Library. Please check the online errata for the text for each chapter as you read it. 

Requirements:

Three homework assignments, a midterm and a final exam. Each student in the course is allowed a total of 4 late days on homeworks with no questions asked; after that, 10% per late day will be deducted from the homework grade, unless you have a note from your doctor.  Do not use these up early!  Save them for real emergencies.  Homeworks are due by midnight on the due date. 

All students are required to have a Computer Science Account for this class. To sign up for one, go to the CRF website and then click on "Apply for an Account".

Homework submission procedure.

Academic Integrity:

Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is not allowed, and will result in an automatic grade of 0 for the entire assignment or exam in which the copying or paraphrasing was done. Your grade should reflect your own work. If you believe you are going to have trouble completing an assignment, please talk to the instructor or TA in advance of the due date.

 

Syllabus:

 

Week

Class

Topic

Reading

Assignments

1

Sep 4

Introduction and Course Overview

 

 

 

Sep 6

Natural Language and Formal Language: Regular Expressions and Finite State Automata

Ch 1-2

 

2

Sep 11

Words and Their Parts:  Morphology

Ch 3.1

 

 

Sep 13

Word Construction and Analysis: Morphological Parsing

Ch 3: 2-6

 

 

3

Sep 18

Word Tokenization, Pronunciation and Spelling

Ch 5:1-8

 

 

Sep 20

N-grams and Language Models

Ch 6

 

4

Sep 25

Word Classes and POS Tagging

Ch 8

 

 

Sep 27

Machine Learning Approaches to NLP and Introduction to Weka

Jansche&Abney02

 

 

5

Oct 2

Context-Free Grammars

Ch 9

 

 

Oct 4

Parsing with Context Free Grammars

Ch 10

 

 6

Oct 9

Probabilistic and Lexicalized Parsing

Ch 12 (Be sure to replace figure 12.3 with new version )

 

 

Oct 11

Representing Meaning

Ch 14

 

 7

Oct 16

Semantic Analysis and Midterm Review

Ch  15:1,4-6

 

 

Oct 18

Midterm Examination

Sample midterm

 
  Oct 23 Relations Among Words Ch 16:1-2  
  Oct 25 Roles Words Can Play Ch 16:3-5  

 8

Oct 30

Word Sense Disambiguation

Ch 17:1-2

 

 

Nov 1

Information Retrieval and Information Extraction

 Ch 17:3-5

 

10

Nov 6

Holiday

Holiday

 Holiday

  Nov 8 Pronouns and Reference Resolution Ch 18: 18.1  
  Nov 13 Algorithms for Reference Resolution    
  Nov 15 Text Coherence and Discourse Structure Ch 18.2-18.5; Grosz&Sidner86  

 

Nov 20

Turn-taking, Grounding and Dialogue Ch 19:1

 

  Nov 22 Thanksgiving Holiday    

13

Nov 27

Dialogue Systems

Ch19:2-6

 

Nov 29

Natural Language Generation

Ch 20

 

14

Dec 4

Machine Translation

Ch 21

 

Dec 6

Spoken Language Processing and Final Review

 

 

 

 

Dec. 11-13

 

 

Study Days

  Dec. 14-21     Final Exams

Links to Resources (cf. also resources available from the text homepage):

General:

Places to look up definitions and descriptions of terminology:

Chapters 1 and 2:

Try out one of the many versions of Eliza on the web.

Chapter3:

AT&T Labs - Research Finite State Machine Library

Later Chapters:

Chapter 19:

Announcements || Academic Integrity || Contributions || Description
 Links to Resources|| Requirements || Syllabus || Text