COMS 4771 is a graduatelevel introduction to machine learning. The course covers basic statistical principles of supervised machine learning, as well as some common algorithmic paradigms.
Lecture schedule
Past and nearterm lectures are listed here, along with links to lecture slides, readings, and other materials. Topics of other planned lectures can be found in the course syllabus. Lecture recordings are available on Courseworks.

Overview of machine learning (9/5)

slides, 2up
Dietterich overview article (through section 3)
(optional) Breiman’s “Two Cultures” article

Nearest neighbors (9/5, 9/7)

slides, 2up
(optional) [ESL] 2.3, 7.10, 13.3; [PC] 4.5

Classification using generative models (9/12)

slides, 2up
(optional) [ESL] 4.3; [PC] 2.12.6

Statistical models for prediction (9/14, 9/19)

slides, 2up
Error rate confidence intervals based on CLT approximation
Some uses of the binomial distribution
(optional) [ESL] 2.4; [PC] 2.3

Decision tree learning (9/21)

slides, 2up
(optional) [ESL] 9.2, 8.7; [PC] 8.28.4, 9.4.2, 9.5.1

Linear regression (9/26, 9/28)

slides, 2up
(optional) [ESL] 2.3.1, 3.13.2

Linear classification (10/3, 10/5)

slides, 2up
Notes on linear separators
(optional) Excerpt from Calvino’s “If on a winter’s night a traveler”
(optional) [ESL] 4.4, 4.5

Feature maps and kernels (10/10, 10/12)

slides, 2up
Hardt’s and Recht’s chapter on “Representations and features”
(optional) Freund’s and Schapire’s “Large Margin Perceptron” paper
(optional) [ESL] 5.15.2, 5.8

Inductive bias and regularization (10/17, 10/19)

slides, 2up
Overfitting in linear regression
(optional) [ESL] 3.4, 12.112.3; [PC] 5.11

Dimension reduction (10/24, 10/31, 11/2)

slides, 2up
Notes on SVD
Best fitting line
(optional) Visualization of power method
(optional) Notes eigenvectors/eigenvalues
(optional) [ESL] 14.5.1, 14.5.45

Optimization by gradient methods (11/9, 11/14, 11/16)

slides, 2up
Simple implementation of autodiff
Notes on gradient descent (through section 2)

Calibration and bias (11/21)

slides, 2up
COMPAS article
Balanced error rate

Generalization theory (11/28, 11/30)

slides, 2up
(optional) Notes on margins

Neural networks (12/5)

slides, 2up
(optional) LeCun et al’s “Efficient BackProp” paper
(optional) Fleuret’s “Little Book of Deep Learning”