COMS 4705: Natural Language Processing, Fall 2010

Time: Tues/Thurs 2:40-3:55
Place: 535 Mudd

Professor Julia Hirschberg (Office Hours Tu 4:15-6:15, CEPSR 705)
julia@cs.columbia.edu, 212-939-7114

Teaching Assistants

Mohamed Altantawy, (ma2795@columbia.edu, Office Hours W 4:30-5:30;Th 4-5, Speech Lab --CEPSR 7LW1)
Wei-Yun Ma (wm2174@columbia.edu, Office Hours:  Tu 10-12, CEPSR 725)

Announcements | Academic Integrity | Contributions | Description
Links to Resources | Requirements | Syllabus | Text

  1. Check Columbia Courseworks for announcements, your grades (only you will see them), and discussion. Professor Hirschberg and your TA will monitor the discussion lists to answer questions.
  2. If you are interested in doing NLP research projects for credit, please let Professor Hirschberg know. The NLP group often has research opportunities available.  Other postings may be found at this location.
  3. Link to CVN website for on-campus students is here.


This course provides an introduction to the field of computational linguistics, aka natural language processing (NLP). We will learn how to create systems that can understand and produce language, for applications such as information extraction, machine translation, automatic summarization, question-answering, and interactive dialogue systems. The course will cover linguistic (knowledge-based) and statistical approaches to language processing in the three major subfields of NLP: syntax (language structures), semantics (language meaning), and pragmatics/discourse (the interpretation of language in context). Homework assignments will reflect research problems computational linguists currently work on, including analyzing and extracting information from large online corpora.


Speech and Language Processing by Jurafsky and Martin, 2nd edition. It will be available from the University Bookstore, as well as from Amazon and other online providers. It should also be on reserve in the Engineering Library. Please check the online errata for the text (check your version) for each chapter as you read it.  Note that readings marked with '*' are optional.


Three homework assignments, a midterm and a final exam. Each student in the course is allowed a total of 5 late days on homeworks (except Homework 1) with no questions asked; after that, 10% per late day will be deducted from the homework grade, unless you have a note from your doctor.  Do not use these up early!  Save them for real emergencies.  Class participation will also be a factor in your final grade.   Here are the weights for the components:

HW1    HW2    HW3    Midterm    Final    Class Participation

10%    20%    20%    15%         25%     10%

CVN students' Class Participation points will be distributed over the other components.  All students are required to have a Computer Science Account for this class. To sign up for one, go to the CRF website and then click on "Apply for an Account". 

Homework submission procedure:

Academic Integrity:

Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is not allowed, and will result in an automatic grade of 0 for the entire assignment or exam in which the copying or paraphrasing was done. Your grade should reflect your own work. If you believe you are going to have trouble completing an assignment, please talk to the instructor or TA in advance of the due date. Read/write protect your homework at all times.


Topic Reading Assignments

Week 1

Sep 7 Introduction and Course Overview    
Sep 9 Natural Language and Formal Language: Regular Expressions and Finite State Automata Ch 1-2  

Week 2

Sep 14 Words and Their Parts Ch 3

HW1 Q/A assigned (Data: WSJ article [pos] [plain txt][guide2pos] [sample input][sample output])

Sep 16 N-grams and Language Models Ch 4  

Week 3

Sep 21 POS Tagging Ch 5  
Sep 23 HMMs Ch 6  

Week 4

Sep 28 Machine Learning Approaches to NLP I

Guest Speaker: Sameer Maskey

Ch 23.1-3;

*ML Survey

Sep 30 ML Approaches II

Guest Speaker: Sameer Maskey

Ch 6.6-6.8 HW1 due: Oct 1, midnight

Weka Tutorial; Weka Download; Weka example

Week 5

Oct 5 Syntax Ch 12-12.4 HW2 Data Mining assigned


Oct 7 Context Free Grammars Ch 12.5-12.10  

Week 6

Oct 12 Syntactic Parsing


Ch 13  
Oct 14 Shallow Parsing and Midterm Review    

Week 7

Oct 19 Midterm Examination Sample midterm  
Oct 21 Statistical Parsing

Guest Speaker: Michael Collins

Ch  14  

Week 8

Oct 26 Representing Meaning Ch 17  
Oct 28 Lexical Semantics

Guest Speaker:  Robert Coyne


Ch 19  

Week 9

Nov 2 University Holiday    
Nov 4 Computational Lexical Semantics Ch 20 HW2 Data Mining due Nov 5, 11:59pm

Week 10

Nov 9 Computational Lexical Semantics Ch 21-21.2 HW3 assigned
Nov 11 Computational Discourse    

Week 11

Nov 16 Reference Resolution Ch 21.3-21.10  
Nov 18 Information Extraction Ch 22  

Week 12

Nov 23 Question Answering Ch 23-23.2  
Nov 26 Thanksgiving Holiday    

Week 13

Nov 30 Summarization

Guest Speaker: Kathy McKeown

Ch 23.3-23.8  
Dec 2 Machine Translation

Guest Speaker: Nizar Habash

Ch 25  



Week 14

Dec 7 Dialogue Systems Ch 24-24.8  
Dec 9 Final Review   HW3 due; Dec 10, 11:59pm

Week 15

Dec 14-15     Study Days
Dec 16, 1:10-4pm     Final Exam in MUDD 535

Links to Resources

cf. also resources available from the text homepage

Places to look up definitions and descriptions of terminology:

  1. Oxford Dictionary of Linguistics
  2. Interesting Language Factoids and Non

Other resources

  1. Karen Chung Language and Linguistics links
  2. CatSpeak
  3. Check out Eliza
  4. AT&T Labs - Research Finite State Machine Library
  5. Appelt and Israel's information extraction tutorial (IJCAI-99).
  6. Framenet.
  7. Ask Jeeves-- a search engine that answers questions in plain English.
  8. Answer Bus -- another Q/A system.
  9. Columbia's NewsBlastersummarizer
  10. IBM summarizer demo (canned)
  11. Systran machine translation (also in use at Babelfish)
  12. AT&T Labs - Research Finite State Machine Library
  13. Michael Collins' Parser
  14. On-line dictionaries in many languages.
  15. WordNet
  16. Framenet
  17. CoBuildDirect Corpus
  18. AT&T's SCANMail voicemail browsing/search system
  19. DiaLeague 2001 -- includes a link to an online dialogue system demo.
  20. James Allen's Dialogue Modeling for Spoken Language Systems ACL 1997 Tutorial
  21. Festival speech synthesizer demo and links to other TTS systems
  22. Julia Hirschberg's Intonational Variation in Spoken Dialogue Systems tutorial

Julia Hirshberg Portrait

Julia Hirschberg
Professor, Computer Science

Columbia University
Department of Computer Science
1214 Amsterdam Avenue
M/C 0401
450 CS Building
New York, NY 10027

email: julia@cs.columbia.edu
phone: (212) 939-7114

Download CV

Columbia University Department of Computer Science / Fu Foundation School of Engineering & Applied Science
450 Computer Science Building / 1214 Amsterdam Avenue, Mailcode: 0401 / New York, New York 10027-7003
Tel: 1.212.939.7000 / Fax: 1.212.666.0140