CS 86998/EE 6898/EECS E6898: Topics - Information Processing: From Data to Solutions, Fall 2016

Time: Fridays 12:10pm-2:00pm
Location: CS conference room (CSB 453)


Shih-Fu Chang
sfchang_AT_ee.columbia.edu, 212-854-6894
Office hours: Friday, 2-3 pm

Julia Hirschberg
julia_AT_cs.columbia.edu, 212-939-7004
Office hours: TBA

Teaching Assistant

Tasha Nagamine
Office hours: Mondays and Wednesdays, 12-1 pm, Mudd 1339

First lecture

Students who would like to be considered for registration for this course must attend the first lecture.
The instructors will approve registration for a selected group of students after reviewing the first class attendance and reviewing essay responses that will be written on the first day of class.

Announcements | Academic Integrity | Description
Readings | Resources | Requirements | Syllabus


This course is designed for participants in the NSF IGERT program "From Data to Solutions".  Students in the seminar may be IGERT Trainees, IGERT affiliates, or other students having the permission of one of the instructors.  The course will consist of a series of presentations by faculty and staff at Columbia and CUNY who will describe interesting problems involving very large amounts of data (text, audio, image, video) that require interdisciplinary collaboration with faculty and students in Computer Science, Electrical Engineering, Statistics, Psychology, Biomedical Informatics, Business and Journalism.  Students taking the course will complete short reading assignments for each class, turn in 1.5-2 page reports on each of the presentations, and prepare a final longer report on one of the problems presented as a final project.  Actual experimental implementations will be welcome, but not mandatory. Some proposed projects may be selected and invited to continue in the following semester or summer under the supervision of the instructors or other participating faculty or researchers from industry.  There are no prerequisites for the course and no exams; students will be selected for the class based upon a questionnaire to be administered the first day of class.  Note that students who are members of the IGERT: From Data to Solutions project (Trainees and Affiliates) will have preference in enrollment.  This is a required course for IGERT Trainees.


Students will be expected to complete all reading assignments before the class.  Students will also prepare questions for the speaker before each talk based on their readings. After the talk, each student will prepare a 1.5-2 page report, which must be submitted in CourseWorks before the following class. In the weekly report, we ask you to briefly summarize the talk and discussion, and outline an approach to one of the interdisciplinary problems described in the presentations. 

There will be no midterm or final exam.  Grades will be based on class participation, weekly reports, and final report.

A guide to weekly reporting can be found here.
An example can be found here.
Information on project proposals, final project reports and presentations can be found here, here and here.

A list of projects from spring 2016 can be found here.


Class participation: 20%

Weekly Reports 40%

Final Report 40%

*Absence policy: An unexcused absence will give you a zero for participation for that day. An excused absence will have no participation penalty.
Absences must be cleared with the professors in advance of the class being missed. Weekly reports are required even if you miss a class.

Academic Integrity

Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is not allowed, and will result in an automatic grade of 0 for the entire assignment or exam in which the copying or paraphrasing was done. Your grade should reflect your own work. If you believe you are going to have trouble completing an assignment, please talk to the instructor or TA in advance of the due date.


Required readings are available online from links in the syllabus below.


Excellent reports from week 4:



Date Topic Readings Presenters
Week 1 (9/09)
Introduction to the Course

Shih-Fu Chang
Week 2 (9/16)
Digital Communications and Data Analytics: Some Privacy, National Security, and Law Enforcement Issues
Excerpts from United States v. Jones, statement from FBI Directory and Deputy Attorney General on "Going Dark", excerpts from Berkman Center report on Going Dark
Matthew Waxman
Daniel Richman slides
Week 3 (9/23)
Closing the loop on data analysis
The Case for Data Visualization Management Systems
Towards Perception-aware Interactive Data Visualization Systems
Eugene Wu

Week 4 (9/30)
Strategies for the analysis of texts that span long historical periods
Lexical shifts, substantive changes, and continuity in State of the Union discourse, 1790-2014

Peter Bearman

Week 5 (10/07) Detecting and Characterizing Events
Detecting and Characterizing Events
Matthew Connelly

Week 6 10/14)
Thompson Sampling for learning in online decision making
Thompson Samping for Contextual Bandits with Linear Payoffs
Further Optimal Regret Bounds for Thompson Sampling
Shipra Agrawal

Week 7 (10/21)
Primary Care Physican Shortages Could be Eliminated Through Use of Teams, Nonphysicians, and Electronic Communication
Primary Care Physician Shorates Could Be Eliminated Through Use of Teams, Nonphysicians, and Electronic Communication
Linda Green
Week 8 (10/28)
Online Harassment as a Form of Censorship
"A Honeypot for Assholes": Inside Twitter's 10-Year Failure to Stop Harrassment"
Automatic identification of personal insults on social news sites
Approve or Reject: Can You Moderate Five New York Times Comments?
Susan McGregor
Week 9 (11/04)
Integrating Text as Data into the Social Sciences
The political economy of tax laws in the U.S. states
Measuring Polarization in High-Dimensional Data: Method and Application to Congressional Speech
Suresh Naidu
Week 10 (11/11)
Exploring Image, Video, and Multimedia in Large-Scale Data Applications
Structured exploration of who, what, when, and where in heterogeneous multimedia news sources
Large-scale visual sentiment ontology and detectors using adjective noun pairs (demo)
EventNet: A large scale Structured Concept Library for Complex Event Detection in Video
Tweeting Cameras for Event Detection. International Conference on World Wide Web
Shih-Fu Chang
Week 11 (11/18) Quantifying population dynamics using hidden relatedness
Whole population, genome-wide mapping of hidden relatedness.
Length distributions of identity by descent reveal fine-scale demographic history.
Itsik Pe'er
Week 12 (11/25) No class (Thanksgiving break)
Final project proposals due at 12 pm

Week 13 (12/02) Historians in the Laboratory
Historians in the Laboratory: Reconstruction of Renaissance Art and Technology in the Making and Knowing Project
Pamela Smith

Week 14 (12/09) Mobile Social Media and Demographics: Opportunities and Risks
FindYou: A Personal Location-Privacy Auditing Tool
Linking Users Across Domains with Location Data: Theory and Validation
"I don't have a photograph, but you can have my footprints." - Revealing the Demographics of Location Data
Augustin Chaintreau

Week 15 (12/14) Final project presentations, 12:30-3:30 pm, CS conference room