Introduction to Data Science

New York University - Spring 2018

Class is held in 60FA 110, Wed 2:00-3:40pm

Office hours: CDS 620

Wednesday 12-2pm: Lecturer, Iddo Drori

Tuesday 11am-1pm: Section Leader, Datta Sainath Dwarampudi

Friday 2-4pm: Grader, Samhita Damotharan

Thursday 2-4pm: Grader, Sai Anirudh Kondaveeti

Spring 2018 classes begin (Monday, January 22)

Lecture 1 (Wednesday, January 24): Introduction
Data collection, cleaning, storage, retrieval, learning, visualization.
Readings: CASI Epilogue, DSB Ch 1-3.
Science and data science, David M. Blei and Padhraic Smyth, PNAS 2017.
Optional: Large-scale physical activity data reveal worldwide activity inequality, Tim Althoff, Rok Sosič, Jennifer L. Hicks, Abby C. King, Scott L. Delp, and Jure Leskovec, Nature 2017.
Lab 1: Tableau
Homework 1: in Tableau, due February 5.

Lecture 2 (Wednesday, January 31): Supervised learning, fitting
Readings: DSB Ch 4-5.
Optional: CASI Ch 8.
Using deep learning and Google Street View to estimate the demographic makeup of neighborhoods across the United States, Timnit Gebru, Jonathan Krause, Yilun Wang, Duyun Chen, Jia Deng, Erez Lieberman Aiden, and Li Fei-Fei, PNAS 2017.
Lab 2: Python, NumPy, sklearn

Lecture 3 (Wednesday, February 7): Unsupervised learning, clustering, dimensionality reduction
Readings: DSB Ch 6.
Optional: Robust continuous clustering, Sohil Atul Shah and Vladlen Koltun, PNAS 2017.
Lab 3: Kaggle competition
Homework 2: Kaggle class churn competition, due February 19.

Lecture 4 (Wednesday, February 14): Performance measures
Readings: DSB Ch 7-9.
Optional: CASI Ch 12.
Computer-based personality judgments are more accurate than those made by humans, Wu Youyoua, Michal Kosinski, and David Stillwella, PNAS 2015.
Lab 4: sklearn clustering

Lecture 5 (Wednesday, February 21): Text mining.
Readings: DSB Ch 10.
Lab 5: Spelling correction
Homework 3: due March 5.

Lecture 6 (Wednesday, February 28): Feature extraction and selection, review for midterm
Readings: CASI Ch 16, DSB Ch 11-12.
Lab 6: sklearn feature extraction

Lecture 7 (Wednesday, March 7): Midterm in class

Spring Recess (Monday-Sunday, March 12-18)

Snow day (Wednesday, March 21): NYU closed, classes cancelled

Lecture 8 (Wednesday, March 28): Differentiable programming, neural networks
Readings: CASI Ch 18.
Term project: due April 25th.
Lab: Kaggle

Lecture 9 (Wednesday, April 4): Deep neural networks, non-linear PCA
Readings: CASI Ch 19.
Lab: sklearn PCA

Lecture 10 (Wednesday, April 11): Bayesian and variational inference
Readings: CASI Ch 3.
Optional: Variational inference: a review for statisticians, Blei et al., 2018.
Lab

Lecture 11 (Wednesday, April 18): Gradient boosting
Readings: CASI Ch 17.
Lab: XGBoost

Lecture 12 (Wednesday, April 25): Meta data and learning
Lab

Lecture 13 (Wednesday, May 2): Data visualization
Readings: Visualizations that really work, Scott Berinato, HBR, 2016.
Optional: The visual display of quantitative information, Edward Tufte, 2001.
Lab: python visualization packages, time series

Last day of Spring 2018 classes (Monday, May 7)

Final exam (Wednesday, May 9): 2:00-3:50pm, room 110.