SDGB7847

Machine Learning for Statistics: SDGB 7847

'What one fool could understand, another can.'
-- R.P. Feynman

Description

The course will give participants an opportunity to implement statistical models. We will cover numerical optimization techniques including gradient descent, newton's method and quadratic programming solvers to fit linear and logistic regression, discriminant analysis, support vector machines and neural networks. The second part of the course will focus on advanced topics for computing posterior distributions and motivate their appeal in Bayesian inference. We will survey importance and rejection sampling, Metropolis, Gibbs sampling, and Sequential Monte Carlo. Students will be exposed to convex duality, constrained optimization, bias/variance decompositions, entropy, mutual information, KL divergence, maximum likelihood/maximum a posteriori estimation, Fisher scoring, Laplace approximation, Markov chains and saddle point methods, each of which will be reemphasized from a computational perspective.

Prerequisites

Multivariate Calculus, Linear Algebra, Probability, Statistical Computing, i.e., be able to program in and have regular access to Matlab, Python or R.

Textbooks

The Elements of Statistical Learning [ESL] by Hastie, Tibshirani, Friedman
Pattern Recognition and Machine Learning [PRML] by Bishop
Convex Optimization [CVX] by Boyd (not required but a nice reference)

Grade Distribution

Homework 45%
Participation 5%
Midterm 20%
Final or Project 30%

Homework

Please typeset your homework using LaTeX which is the standard for technical or scientific documents. You may visit the www.latex-project.org to download a copy and read the following tutorial to get started. A template for the homework is available here (cls, tex, pdf). A word of advice: start early on the graded homework! The instructor has worked through all of the problems and some are challenging. Please use Matlab, Python or R. Each language has its own pros and cons although if you know one you can probably learn the others easily. Standard academic honesty policy applies.

Tentative Course Outline

Overview, Least Squares and MLE

Constrained Optimization, Ridge Regression

Logistic Regression, Gradient Descent and Newton's Method

Lasso, Subgradient Methods, QP Solvers

SVMs, Primal and Dual forms, KKT conditions

Feed Forward Neural Networks, Backpropagation

Midterm

K-means, Expectation Maximization (EM)

Markov Chains, State Space Models

Message Passing, Kalman Filters

Quadrature, Laplace Approximation

Importance/Rejection sampling, Metropolis

Variational Inference

Final or Projects

Date	Topic(s)	Reading	Notes	Assignments	Solutions
1.17	Method of Least Squares Maximum Likelihood Estimation Gaussian Integrals Determinants	PRML and ESL (1-3)	Least Squares The Gaussian Integral
1.24	Review Problem Session Maximum Likelihood Examples Constrained Optimization Ridge Regression	PRML and ESL (3)	MLE Lagrange Multipliers Ridge Regression
1.31	Logistic Regression Gradient Descent and Newton's Method Taylor Expansions and Hessian Matrices	PRML and ESL (4)	Logistic Regression Finding Roots	Homework 1 data	Matlab R Python
2.7	Discriminant Analysis Eigenvalues and Eigenvectors Lab Session	PRML and ESL (4)	LDA Eigenvectors Eigenfaces vs Fisherfaces
2.14	Discriminant Analysis Spectral Decompositions and PCA Support Vector Machines	PRML (6) and ESL (12)	PCA, Spectral Methods SVM Notes, Slides	Practical Tutorial
2.21	Support Vector Machines Primal and Dual Forms Linear and Quadratic Programs	PRML (7) and ESL (12)	SVM Tutorial Duality in Optimization	Homework 2	Python
2.28	Kernels, Reproducing Kernel Hilbert Space Interior Point Methods Backpropagation	PRML (7) and ESL (5.8, 12)	Kernels, RKHS Interior Point Methods	Kernel Demo SVM Demo
3.7	Feed Forward Neural Networks Backpropagation Lab Session	PRML (5) and ESL (11)	Neural Networks BackProp Algorithm Efficient BackProp	Neural Net Demo Practical Tutorial	MNIST Demo
3.14	Spring Break			Midterm	Solutions
3.21	K-Means, Expectation Maximization Jensen's Inequality Entropy, KL Divergence, Free Energy Applying EM to Mixture Models	PRML (9) and ESL (14.3, 8.5)	EM Notes, Slides EM and Thermodynamics	Clustering Demo
3.28	Review Midterm EM Algorithm	PRML (9, 13)	Applying EM Hidden Markov Models	Homework 3 data	Matlab
4.4	Deriving Update Equations for EM Hidden Markov Models Forward-Backwards Algorithm Gamma Algorithm	PRML (13)	EM for Bernoulli Tutorial Forward-Backwards Notes	Practical Tutorial Implementation
4.11	Viterbi Algorithm Baum Welch Algorithm Lab Session	PRML (13)	State Space Models Baum Welch	Project
4.18	Kalman Filtering Numerical Integration Stochastic Processes	PRML (11)	Kalman Filtering Gaussian Convolutions Ergodic Theorem	Interactive Demo Filtering Demo
4.25	Importance and Rejection Sampling Metropolis Hastings	PRML (11)	Markov Chain Monte Carlo Quantum Monte Carlo	Homework 4 (Lab)
5.2	Brownian Motion and the Heat Equation Path Integration Sequential Monte Carlo, Particle Filtering	PRML (11)	Einstein's Theory Sequential Monte Carlo Particle Filtering Tutorial	MCMC Examples