CS 86998/EE 6898/EECS 6898: Topics - Information Processing: From Data to Solutions, Fall 2013
Time: Friday, 1:10-3:00pm
Place: CSB 453 (CS Conference Room)
Professors
Shih-fu Chang (Office Hours: TBD) sfchang_AT_ee.columbia.edu, 212-854-6894
Noemie Elhadad (Office Hours: TBD) noemie_AT_dbmi.columbia.edu
Teaching Assistant: Jessica Ouyang (Office Hours: TBD) ouyangj_AT_cs.columbia.edu
Announcements | Academic Integrity | Description
Readings | Resources | Requirements | Syllabus
Description
This course is designed for participants in the NSF IGERT program "From Data to Solutions". Students in the seminar may be IGERT Trainees, IGERT affiliates, or other students having the permission of one of the instructors. The course will consist of a series of presentations by faculty and staff at Columbia and CUNY who will describe interesting problems involving very large amounts of data (text, audio, image, video) that require interdisciplinary collaboration with faculty and students in Computer Science, Electrical Engineering, Statistics, Psychology, Biomedical Informatics, Business and Journalism. Students taking the course will complete short reading assignments for each class, turn in one-page reports on each of the presentations, and prepare a final longer report on one of the problems presented as a final project. Actual experimental implementations will be welcome, but not mandatory. Some proposed projects may be selected and invited to continue in the following semester or summer under the supervision of the instructors or other participating faculty or researchers from industry. There are no prerequisites for the course and no exams; however, students who are members of the IGERT: From Data to Solutions project (Trainees and Affiliates) will have preference in enrollment. This is a required course for IGERT Trainees.
Requirements/Assignments
Students will be expected to complete all reading assignments before the class for which they are assigned. Students will prepare short reports on each of the presentations. These must be submitted in CourseWorks before the following class. Each student will prepare a longer report outlining an approach to one of the interdisciplinary problems describe in the presentations. There will be no midterm or final exam. Grades will be based on class participation, weekly short reports, and final report.
A guide to weekly reporting can be found here.
An example can be found here.
Information on final reports and presentations can be found here and here.
Grading
Class participation: 30%
Short Reports 30%
Final Report 40%
Academic Integrity
Readings
Required readings are available online from links in the syllabus below.
Announcements
Reports for Week 4 are due on Friday, 04 Oct., at 1pm on CourseWorks. Please post any questions for the speaker on the Discussion Board.
Readings for Week 5 are posted.
Resources
Syllabus
| Date | Topic | Readings | Presenters |
|---|---|---|---|
| Week 1 (9/6) |
Title: Biomedical engineering and informatics applications in the intensive care unit Description: Discussion of the increasing and essential role of biomedical engineering and biomedical informatics in intensive care medicine. The talk will span medical devices for patient monitoring, device integration and data collection, data analysis, and data visualization to facilitate medical decision making. Students should come to appreciate the tremendous unmet need for engineers in healthcare and the potential impact they could have on improving the lives of our sickest patients. |
Hemphill11, Claassen13, Cohen10 |
Michael Schmidt is an Assistant Professor of Clinical Neuropsychology in Neurology at Columbia University College of Physicians and Surgeons. Dr. Schmidt received his undergraduate degree in psychology from Michigan State University and his doctorate in Neuropsychology from the City University of New York. Dr. Schmidt completed a post-doctoral research fellowship in the Division of Critical Care Neurology at Columbia University that lead to his current position in 2005. In 2009, Dr. Schmidt received a 3-year CTSA K12 career development award from the Columbia University Irving Institute for clinical and translational research. He completed a Master's degree in Biostatistics: Patient-Oriented Research from the Columbia School of Public Health in 2011. Dr. Schmidt is the Director of the Neuro-ICU Neuromonitoring and Informatics program and the Columbia University Undergraduate Research Internship in Neurology and Neurosurgery. Dr. Schmidt's interests concentrate on personalized medicine in the Neuro-ICU, including generation of patient-specific physiological targets and early detection of secondary complications related to critical brain injuries through real-time analysis of neurophysiological monitoring data, the use of clinical informatics to support patient management decisions within the intensive care unit, and identifying modifiable factors that drive health outcomes following critical brain injuries. His research as a co-investigator to determine patient status utilizing multimodal neuromonitoring data from critical brain injury patients is supported by the Dana Foundation.
|
| Week 2 (9/13) |
Speaker: Steve Lohr Title: The Age of Big Data Description: Big Data is a vague term, used loosely, if often, these days. But put simply, catchall phrase means three things. First, it is a bundle of technologies. Second, it is a potential revolution in measurement. And third, it is a point of view, or philosophy, about how decisions will be -- and perhaps should be -- made in the future. This talk will elaborate on those three themes. It will also describe the historical context for the technologies and mindset that now fly under the banner of Big Data, and touch on the promise and pitfalls of this approach to decision making. Speaker: Mark Hansen Title: Database and/as narrative John Tukey wrote that the clever data analyst need only "listen to what his data had to tell him." In this talk, I will present a series of art projects that pull stories from data. "Before Us Is the Salesman's House" is a recent work commissioned by eBay as part of the Zero1 Festival in San Jose, CA. Through it, Jer Thorp and I examine how to literally "read" one data set through another. "Exit," developed for the Fondation Cartier pour l'art contemporain, Paris, builds on curator and cultural theorist Paul Virilio's notion that what most defines humanity today are our patterns of migration. The installation visualizes the global movement of people, both forced and voluntary and due to various factors (whether political, economic, and environmental), through a series of six panoramic narratives displayed over the course of 42 minutes. Finally, I will describe "Shuffle," a performance created for the celebration of the New York Public Library's Centennial celebration. The piece is a site-specific mash-up of three texts performed by the Elevator Repair Service over the last decade, The Great Gatsby, The Sound and the Fury and The Sun Also Rises simultaneously. |
Halevy09, The Fourth Paradigm (Foreword, pp xi-xv; Jim Gray on eScience, pp xvii-xxxi), Lohr13 |
Steve Lohr reports on technology, business, and economics. He was a foreign correspondent for the Times for a decade and served brief stints as an deitor, before covering technology, starting in the early 1990s. In 2013, he was part of the team awarded the Pulitzer Prize for Explanatory Reporting "for its penetrating look into business practices by Apple and other technology companies that illustrates the darker side of a changing global economy for workers and consumers." He has written for magazines including The New York Times Magazine, The Atlantic Monthly, and The Washington Monthly. He is the author of a history of computer programming, "Go To: The Story of the Math Majors, Bridge Players, Engineers, Chess Wizards, Maverick Scientists and Iconoclasts -- The Programmers Who Created the Software Revolution (Basic Books, 2001; paperback, 2002). Mark Hansen is a professor in the Columbia University Graduate School of Journalism. |
| Week 3 (9/20) |
Speaker: John Paisley Title: Variational Inference and Big Data Description: A scalable algorithm for approximating posterior distributions called stochastic variational inference. Stochastic variational inference lets one apply complex Bayesian models to massive data sets. This technique applies to a large class of probabilistic models and outperforms traditional batch variational inference, which can only handle small data sets. Stochastic inference is a simple modification to the batch approach, so a significant part of the discussion will focus on reviewing this traditional batch inference method. Speaker: Daniel Hsu Title: Machine learning and privacy Many important applications of machine learning crucially rely on sensitive information collected about individuals (e.g., shopping habits, medical records, financial histories). The failure of conventional anonymization techniques have cause public embarrassment, and therefore indicate that privacy should be a first-order concern in the design of machine learning methods. This talk will give an overview of some recent research along these lines. |
Hoffman13, Dwork10 |
John Paisley is an assistant professor of electrical engineering at Columbia University. He received his Ph.D. in electrical engineering from Duke University in 2010 and did post-docs in the computer science departments at Princeton University and UC Berkeley. He is interested in machine learning, particularly probabilistic models and inference techniques, Bayesian nonparametrics, dictionary learning and topic modeling. Daniel Hsu is an assistant professor in the Department of Computer Science at Columbia University. Previously, he was a postdoc at Microsoft Research New England from 2011 to 2013; before that, he was a postdoc with the Department of Statistics at Rutgers University and the Department of Statistics at the University of Pennsylvania from 2010 to 2011, supervised by Tong Zhang and Sham M. Kakade. He received his Ph.D. in Computer Science in 2010 from the Department of Computer Science and Engineering at UC San Diego, where he was advised by Sanjoy Dasgupta. He received his B.S. in Computer Science and Engineering in 2004 from the Department of Electrical Engineering and Computer Sciences at UC Berkeley. His research interests are in algorithmic statistics and machine learning. |
| Week 4 (9/27) |
Title: Identifying Deception from Speech Abstract: There has been considerable interest in recent years in automatic methods of detecting deception to supplement current human and polygraph approaches, especially using new sources of information. Evidence of deception appears in many dimensions: biometric information, body gesture, facial expression, written words, and speech characteristics. Our focus is on detection deception from acoustic/prosodic and lexical cues in speech. We are collecting large corpora of deceptive and non-deceptive speech to study how speakers vary their productions when lying and telling the truth. Our machine learning experiments predicting deception achieve performance which compares favorably with the performance of human judges on the same data and task. We find that personality factors may be a key factor in successful human judgments, based on our perception studies, and hypothesize that these may also play an important role in the individual differences we find in production. |
Hirschberg05, Enos06 | Julia Hirschberg (Engineering) |
| Week 5 (10/4) |
Title: Transforming the Impossible to the Natural Abstract: Reading science fiction over the past one hundred years, one sees many seemingly impossible machines and services, which are now not only widely available, but have become accepted as natural. In this talk, I will share examples that show how technologies developed in research labs have impacted real life user experiences. For example, body gesture, speech, natural user intent understanding, and other new usage scenarios have all recently impacted how users utilize computing. Looking forward, I see exciting opportunities for research to further extend what is considered natural when using computers. What's natural in computing at the end of 21st century will be drastically different than what we find common today. |
Chen13, Wang12, Weng13, Zheng, Zhang13 | Hsiao-Wuen Hon (Microsoft Research Asia) |
| Week 6 (10/11) |
Gary Natriello (Teachers College) | ||
| Week 7 (10/18) |
Laura Kurgan (Architecture) | ||
| Week 8 (10/25) |
Smaranda Muresan (CCLS) | ||
| Week 9 (11/1) |
Barbara Grosz (Harvard) | ||
| Week 10 (11/8) | Maria Feng (Civil Engineering) | ||
| Week 11 (11/15) |
Samantha Kleinberg (Stevens Institute of Technology) | ||
| Week 12 (11/22) |
Nick Genes, Mike Chary (Mt. Sinai) | ||
| Finals week (12/6) |
Jeff Sears/Orin Herskowitz (Legal and CTV) |


