Overview of the course

The (planned) list of topics is as follows.

  1. Nearest neighbors
  2. Classification using generative models
  3. Statistical models for prediction
  4. Decision tree learning
  5. Linear regression
  6. Linear classification
  7. Feature maps and kernel methods
  8. Inductive bias and regularization
  9. Dimension reduction
  10. Optimization by gradient methods
  11. Optimization problems and duality
  12. Multi-class linear prediction
  13. Calibration and bias
  14. Generalization theory
  15. Neural networks

In the first four topics (nearest neighbors through decision tree learning), we will give a broad overview of machine learning. We will describe the types of problems to which machine learning is typically applied, the general “machine learning approach” to solving these problems, as well as a statistical framework for thinking about such problems. We will introduce important paradigms behind machine learning algorithms (memorization, statistical modeling, optimization). We will present some “best practices” in machine learning for evaluation, hyperparameter tuning/model selection, and model averaging.

In the next four topics (linear regression through inductive bias and regularization), we introduce some linear algebraic and geometric approaches to machine learning. The mathematical concept of linearity will be featured throughout as a simplifying but also empowering modeling assumption. Through linear models, we will also explore different ways of introducing prior domain knowledge into the learning process.

The next topic (dimension reduction) is a brief interlude into unsupervised learning, specifically the problem of dimension reduction. We will derive the principal component analysis method and discuss some of its applications.

The next three topics (optimization by gradient methods through multi-class linear prediction) introduce concepts and methods from mathematical optimization used in machine learning. We will motivate and describe gradient-based optimization methods, as well as automatic differentiation, which are ubiquitous in modern machine learning. We will also introduce Lagrange duality as a technique for understanding machine learning methods. We will apply these methods and concepts to the problem of multi-class linear prediction.

The final three topics (calibration and bias through neural networks) are a hodgepodge of advanced topics. We will discuss the problem of calibration and how it is related to bias in machine learning. We will describe some probabilistic theory for understanding generalization. And finally we will motivate and describe neural networks in the context of other machine learning methods we have seen in the course.