Teaching

CS 86998/EE 6898/EECS 6898: Topics - Information Processing: From Data to Solutions, Fall 2014

Time: Friday, 1:10-3:40pm
Place: MUDD 825

Professors

Shih-fu Chang (Office Hours: Friday 4-5pm) sfchang_AT_ee.columbia.edu, 212-854-6894

Julia Hirschberg (Office Hours: Monday 3-4pm) julia_AT_cs.columbia.edu, 212-939-7004

Noemie Elhadad (Office Hours: by appointment) noemie.elhadad_AT_columbia.edu, 212-305-0509

Teaching Assistants

Laura Willson (Office Hours: Wednesday 11am-1pm, 7LW3 CEPSR/Shapiro) willson_AT_cs.columbia.edu

Anna Prokofieva (Office Hours: Tuesday 12pm-1pm, 7LW3 CEPSR/Shapiro) prokofieva_AT_cs.columbia.edu

Announcements | Academic Integrity | Description
Readings | Resources | Requirements | Syllabus

Description

This course is designed for participants in the NSF IGERT program "From Data to Solutions".  Students in the seminar may be IGERT Trainees, IGERT affiliates, or other students having the permission of one of the instructors.  The course will consist of a series of presentations by faculty and staff at Columbia and CUNY who will describe interesting problems involving very large amounts of data (text, audio, image, video) that require interdisciplinary collaboration with faculty and students in Computer Science, Electrical Engineering, Statistics, Psychology, Biomedical Informatics, Business and Journalism.  Students taking the course will complete short reading assignments for each class, turn in one-page reports on each of the presentations, and prepare a final longer report on one of the problems presented as a final project.  Actual experimental implementations will be welcome, but not mandatory. Some proposed projects may be selected and invited to continue in the following semester or summer under the supervision of the instructors or other participating faculty or researchers from industry.  There are no prerequisites for the course and no exams; students will be selected for the class based upon a questionnaire to be administered the first day of class.  Note that students who are members of the IGERT: From Data to Solutions project (Trainees and Affiliates) will have preference in enrollment.  This is a required course for IGERT Trainees.

Requirements/Assignments

Students will be expected to complete all reading assignments before the class for which they are assigned.  Students will prepare short reports on each of the presentations.  These must be submitted in CourseWorks before the following class. Each student will prepare a longer report outlining an approach to one of the interdisciplinary problems describe in the presentations.  There will be no midterm or final exam.  Grades will be based on class participation, weekly short reports, and final report.

A guide to weekly reporting can be found here.

An example can be found here.

Information on project proposals, final project reports and presentations can be found here, here and here.

Grading

Class participation: 30%

Short Reports 30%

Final Report 40%

Academic Integrity

Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is not allowed, and will result in an automatic grade of 0 for the entire assignment or exam in which the copying or paraphrasing was done. Your grade should reflect your own work. If you believe you are going to have trouble completing an assignment, please talk to the instructor or TA in advance of the due date.

Readings

Required readings are available online from links in the syllabus below.

Announcements

The last weekly report will be for Week 12 (on Annie Dorsen's talk, due on 11/28). There is no report on Week 13.

All final reports will be due on Thursday December 11th at 4pm. Final presentations will be held on Thursday December 11th from 4-6:30pm and on 1:10-3:40pm Friday December 12th.

Final Project Proposals are due on Friday, October 31st at noon

Resources

Syllabus

Date Topic Readings Presenters
Week 1 (9/5)
Introduction to the Course
None
Shih-fu Chang, Noemie Elhadad, Julia Hirschberg
Week 2 (9/12)
The State of Video Journalism
video.towcenter.org
Duy Linh Tu
Week 3 (9/19)
Learning High Precision Text Normalization Systems from (Mostly) Unlabeled Data
What to do about bad language on the internet, Hippocratic Abbreviation Expansion, page with Eisenstein's Social Media Data
Brian Roark
Week 4 (9/26) The Latino Media Gap
The Latino Media Gap: A Report on the State of Latinos in U.S. Media
Frances Negrón-Muntaner
Week 5 (10/3)
NLM resources for research and practice: an overview and an R&D application
UMLS Quick Start Guide, MEDLINE®/PubMed® Resources Guide, NLM Databases, Resources & APIs, E-utilities Quick Start, What can natural language processing do for clinical decision support?
Dina Demner-Fushman
Week 6 (10/10)
"shift +control" for Journalism
"Woman With 3 Jobs Died From Gas Fumes While Napping, Authorities Say", "Illuminating a Life Made Visible by Death", What Happens to #Ferguson Affects Ferguson: Net Neutrality, Algorithmic Filtering and Ferguson, Tow Center Report Post-Industrial Journalism, Introduction: The Transformation of American Journalism is Unavoidable , ProPublica's Losing Ground
Emily Bell
Week 7 (10/17)
Reverse engineering the neural mechanisms involved in robust speech processing
The Neural Code That Makes us Human, Selective cortical representation of attended speaker in multi-talker speech perception, Phonetic Feature Encoding in Human Superior Temporal Gyrus
Nima Mesgarani
Week 8 (10/24)
Computational Cameras
Computational Cameras: Redefining the Image, Computational Cameras: Convergence of Optics and Processing
Shree Nayar
Week 9 (10/31)
Great Exploitations: technological determinism and the National Security Agency
S. Landau, Making Sense of Snowden Part II: What's Significant in the NSA Surveillance Revelations, IEEE Security and Privacy, declaration of Edward Felten in ACLU vs. Clapper
Matthew Jones
Week 10 (11/7) Experimental Methods in the Study of Culture
State of the Discipline Report, Literature is not Data: Against Digital Humanities, Patent on Automatically assigning medical codes using natural language processing, Qualcomm v. Broadcom: Implications for Electronic Discovery
Dennis Tenen
Week 11 (11/14)
No Class


Week 12 (11/21)
Algorithmic Theater
On Algorithmic Theatre, Manfred Mohr, Artist's Statement, Paris, 1971, Manfred Mohr, Artist's Statement, Paris, 1975
Annie Dorsen
Week 13 (12/5)
Distributional Semantics in IBM Watson

Building Watson: An Overview of the DeepQA Project, Watson: Beyond Jeopardy!, Videos on Watson, Deep QA publications website
Alfio Gliozzo
Week 14 (12/11,12/12)


Final Presentations