CS 86998/EE 6898/EECS 6898: Topics - Information Processing: From Data to Solutions, Spring 2016

Time: Fridays 12pm-2pm
Place: CS Conference Room -- CSB 453

Professors

Shih-Fu Chang (Office Hours: Monday 4-5, CEPSR 709) sfchang_AT_ee.columbia.edu, 212-854-6894

Julia Hirschberg (Office Hours: Wednesday 3-5, CSB 450 ) julia_AT_cs.columbia.edu, 212-939-7004

Noemie Elhadad (Office Hours: Friday 11-12, CS lounge) noemie.elhadad_AT_columbia.edu, 212-305-0509

Teaching Assistant

Sarah Ita Levitan (Office Hours: Tuesday 11:30-12:30, CEPSR 7LW3) sarahita_AT_cs.columbia.edu

Survey

Students who would like to be considered for registration for this course must fill out this survey by the end of the first class lecture on 1/22.
The instructors will approve registration for a selected group of students with appropriate backgrounds after reviewing the survey responses.

Announcements | Academic Integrity | Description
Readings | Resources | Requirements | Syllabus

Description

This course is designed for participants in the NSF IGERT program "From Data to Solutions".  Students in the seminar may be IGERT Trainees, IGERT affiliates, or other students having the permission of one of the instructors.  The course will consist of a series of presentations by faculty and staff at Columbia and CUNY who will describe interesting problems involving very large amounts of data (text, audio, image, video) that require interdisciplinary collaboration with faculty and students in Computer Science, Electrical Engineering, Statistics, Psychology, Biomedical Informatics, Business and Journalism.  Students taking the course will complete short reading assignments for each class, turn in 1.5-2 page reports on each of the presentations, and prepare a final longer report on one of the problems presented as a final project.  Actual experimental implementations will be welcome, but not mandatory. Some proposed projects may be selected and invited to continue in the following semester or summer under the supervision of the instructors or other participating faculty or researchers from industry.  There are no prerequisites for the course and no exams; students will be selected for the class based upon a questionnaire to be administered the first day of class.  Note that students who are members of the IGERT: From Data to Solutions project (Trainees and Affiliates) will have preference in enrollment.  This is a required course for IGERT Trainees.

Requirements/Assignments

Students will be expected to complete all reading assignments before the class.  Students will also prepare questions for the speaker before each talk based on their readings. After the talk, each student will prepare a 1.5-2 page report, which must be submitted in CourseWorks before the following class. In the weekly report, we ask you to briefly summarize the talk and discussion, and outline an approach to one of the interdisciplinary problems described in the presentations. 

There will be no midterm or final exam.  Grades will be based on class participation, weekly reports, and final report.

A guide to weekly reporting can be found here.
An example can be found here.
Information on project proposals, final project reports and presentations can be found here, here and here.

Grading

Class participation: 20%

Weekly Reports 40%

Final Report 40%

*Absence policy: An unexcused absence will give you a zero for participation for that day. An excused absence will have no participation penalty.
Absences must be cleared with the professors in advance of the class being missed. Weekly reports are required even if you miss a class.

Academic Integrity

Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is not allowed, and will result in an automatic grade of 0 for the entire assignment or exam in which the copying or paraphrasing was done. Your grade should reflect your own work. If you believe you are going to have trouble completing an assignment, please talk to the instructor or TA in advance of the due date.

Readings

Required readings are available online from links in the syllabus below.

Announcements

Excellent reports from week 4:

Resources

Syllabus

Date Topic Readings Presenters
Week 1 (1/22)
Introduction to the Course

Shih-Fu Chang, Noemie Elhadad, Julia Hirschberg
Week 2 (1/29)
Exploring Image, Video, and Multimedia in Large-Scale Data Applications
1. Structured exploration of who, what, when, and where in heterogeneous multimedia news sources
online demo (username: demomedia, passwd: demomedia)
2. Object-based visual sentiment concept analysis and application
online demo
3. EventNet: A large scale Structured Concept Library for Complex Event Detection in Video
online demo
4. Tweeting Cameras for Event Detection. International Conference on World Wide Web
Shih-Fu Chang
slides
Week 3 (2/5)
Cross-cultural Production and Detection of Deception from Speech
Cross-Cultural Production and Detection of Deception from Speech
Julia Hirschberg &
Sarah Ita Levitan
slides
Week 4 (2/12)
Mine Your Own Business: Using Text Mining in Business Applications
Mine Your Own Business: Market Structure Surveillance Through Text Mining
Ideation, Creativity and Prototypically
Oded Netzer
slides
Week 5 (2/19) When Enough is Enough: Location Tracking, Mosaic Theory, and Machine Learning
When Enough is Enough: Location Tracking, Mosaic Theory, and Machine Learning
Steve Bellovin
slides
Week 6 (2/26)
Biomedical Engineering and Informatics Applications in Critical Care
Multimodal monitoring and neurocritical care bioinformatics
Multimodality Monitoring: Informatics, Integration Data Display and Analysis
Michael Schmidt
slides
Week 7 (3/4)
Casual Inference from Complex Observational Data
A review of causal inference for biomedical informatics
Samantha Kleinberg
slides
Week 8 (3/11)
Leveraging small data to fuel, personalize, sustain, and study health and care
Leveraging Multi-Modal Sensing for Mobile Health: a Case Review in Chronic Pain
Center of excellence for mobile senser Data-to-knowledge (MD2K)
Deborah Estrin
slides
Week 9 (3/18)
Spring Recess


Week 10 (3/25)
Summarizing the Patient Record and Modeling Diseases from EHR Observations HARVEST, a longitudinal patient record summarizer
Learning probabilistic phenotypes from heterogeneous EHR data
Noemie Elhadad
slides
Week 11 (4/1) Privacy in a Data-Driven World
Sunlight: Fine-grained Targeting Detection at Scale with Statistical Confidence
Discovering Unwarranted Associations in Data-Driven Applications with the FairTest Testing Toolkit
Roxana Geambasu
slides
Week 12 (4/8)
Reverse engineering the neural mechanisms involved in robust speech processing
Phonetic feature encoding in human superior temporal gyrus
Selective cortical representation of attended speaker in multi-talker speech perception
Exploring How Deep Neural Networks Form Phonemic Categories
Nima Mesgarani
slides
Week 13 (4/15)
Methods for Identifying Public Health Trends
CS related:
Collective Supervision of Topic Models for Predicting Surveys with Social Media
SPRITE: Generalizing Topic Models with Structured Priors
Health related:
News and Internet Searches About Human Immunodeficiency Virus After Charlie Sheen’s Disclosure
Understanding Vaccine Refusal
Could Behavioral Medicine Lead the Web Data Revolution?
Mark Dredze
slides
Week 14 (4/22)
The Latino Disconnect
The Latino Disconnect
The Latino Media Gap
Frances Negron-Muntaner
slides
Week 15 (4/29)


No class
Week 16 (5/4, 5/5)
Order of Presentations
Day 1 voting
Day 2 voting
Final Presentations
12-2pm CS conference room