CSEE E6600: From Data to Solutions, Spring 2018

Time: Fridays 10 AM to 12 PM

Location: CS Conference Room




Shih-Fu Chang

sfchang_AT_ee.columbia.edu, 212-854-6894

Office hours: Friday 2-3 PM, CEPSR  709


Julia Hirschberg

julia_AT_cs.columbia.edu, 212-939-7004

Office hours: Monday 3:15-4:15 PM, CSB 450


Teaching Assistant


Rose Sloan


Office hours: Tuesday 2-3 PM, CEPSR 7LW3

First lecture

Students who would like to be considered for registration for this course must attend the first lecture.

The instructors will approve registration for a selected group of students after reviewing the first class attendance and reviewing essay responses that will be written on the first day of class.

Announcements | Academic Integrity | Description

Readings | Resources | Requirements | Syllabus


This course is designed for participants in the NSF IGERT program "From Data to Solutions".  Students in the seminar may be IGERT Trainees, IGERT affiliates, or other students having the permission of one of the instructors.  The course will consist of a series of presentations by faculty and staff at Columbia and CUNY who will describe interesting problems involving very large amounts of data (text, audio, image, video) that require interdisciplinary collaboration with faculty and students in Computer Science, Electrical Engineering, Statistics, Psychology, Biomedical Informatics, Business and Journalism.  Students taking the course will complete short reading assignments for each class, turn in 1.5-2 page reports on each of the presentations, and prepare a final longer report on one of the problems presented as a final project.  Actual experimental implementations will be welcome, but not mandatory. Some proposed projects may be selected and invited to continue in the following semester or summer under the supervision of the instructors or other participating faculty or researchers from industry.  There are no prerequisites for the course and no exams; students will be selected for the class based upon a questionnaire to be administered the first day of class.  Note that students who are members of the IGERT: From Data to Solutions project (Trainees and Affiliates) will have preference in enrollment.  This is a required course for IGERT Trainees.


Students will be expected to complete all reading assignments before the class.  Students will also prepare questions for the speaker before each talk based on their readings. After the talk, each student will prepare a 2-3 page report, which must be submitted in CourseWorks before the following class. In the weekly report, we ask you to briefly summarize the talk and discussion, and outline an approach to one of the interdisciplinary problems described in the presentations.

There will be no midterm or final exam.  Grades will be based on class participation, weekly reports, and final report.

A guide to weekly reporting can be found here.

An example can be found here.

Information on project proposals, final project reports and presentations can be found here, here and here.

A list of projects from spring 2016 can be found here.


Relevant dates:

April 29: project proposals due

May 9: presentations

May 11: final report due



Class participation: 20%

Weekly Reports 40%

Final Report 40%

*Absence policy: An unexcused absence will give you a zero for participation for that day. An excused absence will have no participation penalty.

Absences must be cleared with the professors in advance of the class being missed. Weekly reports are required even if you miss a class.

Academic Integrity

Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is not allowed, and will result in an automatic grade of 0 for the entire assignment or exam in which the copying or paraphrasing was done. Your grade should reflect your own work. If you believe you are going to have trouble completing an assignment, please talk to the instructor or TA in advance of the due date.


Required readings are available online from links in the syllabus below.


Excellent reports from Spring 2016:

_      Richard Munoz

_      Patrick Rogan

_      Matthew Sisco

_      Jie Yuan








Week 1 (1/19)

Course Introduction and Essay Assignment



Week 2 (1/26)

History as a Data Science: Using Algorithms to Analyze Archives, Detect Events, and Identify State Secrets


Using Artificial Intelligence to Identify State Secrets

Mining Events with Declassified Diplomatic Documents

Matt Connelly

Week 3 (2/2)

Privacy in a Data-Driven World

Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization

Sunlight: Fine-grained Targeting Detection at Scale with Statistical Confidence

Roxana Geambasu

Week 4 (2/9)

The Neural Signatures of Social Relations


Neural precursors of future liking and reciprocity

Neural mechanisms tracking popularity in real-world social networks

Peter Bearman

Week 5 (2/16)

Using Unstructured and Social Media Data for Business Decisions

Why understanding the political influence of social media extends beyond Russia

Idea Generation, Creativity, and Prototypicality

Mine Your Own Business: Market Structure Surveillance through Text Mining

When Words Sweat: Written Words Can Predict Loan Default

Oded Netzer

Week 6 (2/23)

Large-scale Multimedia Content Understanding and Retrieval

(more slides)


Event Specific Multimodal Pattern Mining for Knowledge Base Construction

Large-scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs

Placing Broadcast News Videos in their Social Media Context using Hashtags

Shih-Fu Chang

Week 7 (3/2)

Robust Processing of Speech in Human Auditory Cortex 

Neural decoding of attentional selection in multi-talker environments without access to clean sources

Speaker-independent Speech Separation with Deep Attractor Network

Nima Mesgarani

Week 8 (3/9)

Cross-cultural Deception Detection from Speech

Hybrid Acoustic-Lexical Deep Learning Approach for Deception Detection

Cross-cultural Deception Detection

Sarah Ita Levitan

Week 9 (3/23)

Computational Models of Understanding Language in Social Context

The Role of Conversation Context for Sarcasm Detection in Online Interactions

Smaranda Muresan

Week 10 (3/30)

Making and Knowing: Creating a Digital Critical Edition of a 16th Century Manuscript

Historians in the Laboratory: Reconstruction of Renaissance Art and Technology in the Making and Knowing Project,

Sophie Pitman and Tilmann Taape

Week 11 (4/6)

 Ethical publishing in the age of big data










Emily Bell

Week 12 (4/13)

Collective Intelligence and Iterative Design

Design principles for visual communication

Lydia Chilton

Week 13 (4/20)

Social Media: Identifying and Addressing Bias and Discrimination

Algorithmic Glass Ceiling in Social Networks

҉ donմ have a photograph, but you can have my footprints.Ӽspan style='font-family:"Menlo Regular"'> – Revealing the Demographics of Location Data

Linking Users Across Domains with Location Data: Theory and Validation

Augustin Chaintreau

Week 14 (4/27)

Project discussion