COMS 6998: Empirical Methods of Data Science, Fall 2021

Professor Michelle Levine
Email: mlevine@cs.columbia.edu
Lectures: Fridays 12:10am- 2:00pm, 415 Schapiro Cepser
Office Hours: Fridays 2-3pm at CSB Courtyard nook northwest 452B and By Appointment

Teaching Assistant:

Newman Cheng

Email: nc2893@columbia.edu
Office Hours: Wednesdays 3-4pm at CSB Courtyard nook northeast 452A and by appointment



Prerequisite: COMS 4705 (NLP) preferred and one statistics course required.

Description

Empirical Methods of Data Science is a seminar for students seeking an in depth understanding of how to conduct empirical research in computer science. In the first part of the seminar, we will discuss how to critically examine previous research, build and test hypotheses, and collect data in the most ethical and robust manner. As we explore different means of data collection, we will dive into ethical concerns in research. Next, we will explore how to most effectively analyze different data sets and how to present the data in engaging and exciting ways. In the last part of the seminar, we will hear from different researchers on the methods they use to conduct research, lending to further conversations about when and how to use particular research methods. The focus will be primarily on relatively small data sets but we will also address big data. Students will complete homework assignments and a group research project (paper and presentation).

Grade Breakdown

Further details to be provided on Courseworks.

Absence Policy: An unexcused absence or unexcused late arrival will give you a zero for participation for that day. Absences and late arrivals must be cleared with the professor in advance of the class meeting. If excused, there will be no participation penalty. Please make sure you are aware of and follow the Attendance Policies and Missed Classes for this semester which is found here.

Assignment Submission Policy: All assignments must be submitted through CourseWorks by the deadline. Late assignments will not be accepted and will receive a zero. You are required to check that your file properly uploaded. A corrupted file, or zip file that does not open, will count as a late assignment and you will receive a zero.

Academic Integrity
The SEAS academic integrity policy is found here.
The CS academic integrity policy is found here.

Courseworks and Ed Discussion
Students are responsible for actively checking Courseworks and Piazza.
Use Courseworks for: accessing readings and assignments; participating in instructor-lead discussions; receiving email announcements
Use Piazza for: posting questions; forming student discussions

Schedule
Note: Schedule is subject to change. All changes will be posted below and announced through Courseworks.

Date Topic Assignments & Due Dates
Week 1 (9/10)
Introduction to the Course

Week 2 (9/17)
The Scientific Method
Conducting a Literature Review

Week 3 (9/24)
Scientific Method & Big Data
Designing a Study
Assignment 1 Due
Week 4 (10/1)
Data Collection Methods

Week 5 (10/8) Data Analysis Tools (NLP Demo)
Project Proposal Due

Week 6 (10/15)
Ethics

Week 7 (10/22)
Ethics, Part 2

Project Progress Report #1 Due
Week 8 (10/29)
Ethics, Part 3



Week 9 (11/5)
Data Analysis Techniques
Guest Speaker/Demo
Assignment 2 Due

Week 10 (11/12) Graphing and Reporting Data
Guest Speaker/Demo
Project Progress Report #2 Due
Week 11 (11/19) Presenting & Publishing Research
Guest Speaker/Demo
Project Rough Draft Due

Thanksgiving Holiday (11/26) NO CLASS

Week 12 (12/3) Research in the Press

Assignment 3 Due

Week 13 (12/10) Student Presentations
Submit Presentation Slides by 12/9

Finals Week Final Project Paper Due