CS 86998/EE 6898/EECS 6898: Topics - Information Processing: From Data to Solutions, Fall 2012

Time: Friday, 1:10-3:00pm
Place: CSB 453 (CS Conference Room)


Julia Hirschberg (Office Hours: TBD) julia_AT_cs.columbia.edu, 212-939-7114

Shih-fu Chang (Office Hours: TBD) sfchang_AT_ee.columbia.edu, 212-854-6894

Teaching Assistant TBD

Announcements | Academic Integrity | Description
Readings | Resources | Requirements | Syllabus


This course is designed for participants in the NSF IGERT program "From Data to Solutions".  Students in the seminar may be IGERT Trainees, IGERT affiliates, or other students having the permission of one of the instructors.  The course will consist of a series of presentations by faculty and staff at Columbia and CUNY who will describe interesting problems involving very large amounts of data (text, audio, image, video) that require interdisciplinary collaboration with faculty and students in Computer Science, Electrical Engineering, Statistics, Psychology, Biomedical Informatics, Business and Journalism.  Students taking the course will complete short reading assignments for each class, turn in one-page reports on each of the presentations, and prepare a final longer report on one of the problems presented as a final project.  Actual experimental implementations will be welcome, but not mandatory. Some proposed projects may be selected and invited to continue in the following semester or summer under the supervision of the instructors or other participating faculty or researchers from industry.  There are no prerequisites for the course and no exams; however, students who are members of the IGERT: From Data to Solutions project (Trainees and Affiliates) will have preference in enrollment.  This is a required course for IGERT Trainees.


Students will be expected to complete all reading assignments before the class for which they are assigned.  Students will prepare short reports on each of the presentations.  These must be submitted in CourseWorks before the following class. Each student will prepare a longer report outlining an approach to one of the interdisciplinary problems describe in the presentations.  There will be no midterm or final exam.  Grades will be based on class participation, weekly short reports, and final report.

A guide to weekly reporting can be found here.

For information on final reports and presentations .follow the links in the listing for the last session of the course, below.


Class participation: 30%

Short Reports 30%

Final Report 40%

Academic Integrity

Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is not allowed, and will result in an automatic grade of 0 for the entire assignment or exam in which the copying or paraphrasing was done. Your grade should reflect your own work. If you believe you are going to have trouble completing an assignment, please talk to the instructor or TA in advance of the due date.


Required readings are available on line from links in the syllabus below.




Date Topic Readings Presenters
Week 1 (9/7)
Disruption and Resurrection:  New Models for Saving Narrative Journalism Stories, optional: Shirky12, Fallows10


Michael Schapiro (Journalism) and David Elson (Google)
Week 2 (9/14)
Mining Audio Ellis&Lee06 Dan Ellis (Electrical Engineering)
Week 3 (9/21)
Statistical Machine Translation (Slides1, Slides2, Slides3, Slides4) Knight97 Michael Collins (Computer Science)
Week 4 (9/28) On a road to personalized therapy, understanding cancer through mining big data:  A lesson on heterogeneity Akaviaetal12 Dana Pe'er (Biology)
Week 5 (10/5)
Mine Your Own Business Netzeretal12 Oded Netzer (Business)
Week 6 (10/12)
  This is an all-day workshop.  Please register at http://caossnyc.org/registration if you are not already registered. Workshop on Computational and Online Social Science
Week 7 (10/19)
Research Methods and Design Gleitmanetal11 Michelle Levine (Psychology)
Week 8 (10/26)
Observational Studies in Big Healthcare Data: Are They Any Good? Ioannidis05 David Madigan (Statistics), Tony Jebara (Computer Science), and Chris Wiggins (APAM)
Week 9 (11/2)
Data-Intensive Science: Methods for Reproducibility and Dissemination of Computational Results Donohoetal09; Roundtable10; Stoddenetal10 Victoria Stodden (Statistics)
Week 10 (11/9) Redaction and Declassification of Government Archives Trachtenberg, Burr, RFK papers (optional) Matt Connelly (History)
Week 11 (11/16)
Inferring Gold Standards from Multiple, Noisy Annotations   Bob Carpenter (Statistics)
Week 12 (11/30)
Learning from electronic health records and online health
Hripcsak&Albers12 Noemie Elhadad and George Hripcsak (Biomedical Informatics)
Week 13 (12/7)
Introduction to University Tech Transfer and Patents 101 UnsoldPatents, PatentasSword, SmartphoneDeals, CTVFAQ, ToPromoteInnovation (Ch1, pp 1-2 (and ftnt 7), pg. 4-7; 9-12; 26-27; 31-35; 37 (heading 2)) and suggested (Ch2, 3-8 (start with I. to end of section), Ch3, 1-2, 30-33, start with IV to end of D)) Orin Herskowitz (CTV) and Jeff  Sears(OGC)
Finals week (12/14) 1:10-4pm
Final Reports due

Student Presentations