Readings

Handouts

L.G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, 1984
- Original PAC learning paper
A. Blumer, A. Ehrenfeucht, D. Haussler, M.K. Warmuth. Occam’s razor. Information Processing Letters, 24:377-380, 1987.
- Finite hypothesis classes / bit complexity

V.N. Vapnik, A.Ya. Červonenkis. On uniform convergence of the frequencies of events to their probabilities. Teoriya Veroyatnostei i ee Primeneniya, 16(2):264-279, 1971.
- Original VC paper
A. Blumer, A. Ehrenfeucht, D. Haussler, M.K. Warmuth. Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36(4):929-965, 1989.
- Another paper about VC dimension
P.L. Bartlett, S. Mendelson. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results. Journal of Machine Learning Research, 3:463-482, 2002.
- Relates Rademacher complexity, Gaussian complexity, and uniform convergence
R. Meir, T. Zhang. Generalization error bounds for Bayesian mixture algorithms. Journal of Machine Learning Research, 4:839-860, 2003.
- Rademacher-based analysis of study Bayesian mixture algorithms; Lipschitz contraction property for Rademacher averages without absolute value; convex duality-based bounds on Rademacher complexity.

Y. Freund. Boosting a weak learning algorithm by majority. Information and Computation, 121(2):256-285, 1995.
- Introduces Boost-by-Majority
Y. Freund, R.E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119-139, 1997.
- Introduces AdaBoost
R.E. Schapire, Y. Freund, P. Bartlett and W.S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5):1651-1686, 1998.
- Margin bound for weighted majority classifiers
A.R. Klivans, R.A. Servedio. Boosting and hard-core set construction. Machine Learning, 51(3):217-238, 2003.
- Connection to hardness amplification
R.E. Schapire. Drifting games. Machine Learning, 43(3):265-291, 2001.
- Introduces drifting games, with analysis of Boost-by-Majority

P. Long, R. Servedio. Random classification noise defeats all convex potential boosters. Machine Learning, 78(3):287-304, 2010.
- Failure of boosting with convex losses when labels are noisy
T. Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization. The Annals of Statistics, 32:56-85, 2004
- Develops theory of classification calibration with surrogate losses
P.L. Bartlett, M.I. Jordan, J.D. McAuliffe. Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101(473):138-156, 2006.
- Develops theory of classification calibration with surrogate losses
P. Auer, M. Herbster, M.K. Warmuth. Exponentially Many Local Minima for Single Neurons. Advances in Neural Information Processing Systems 8:315–322, 1995.
- Introduces the “matching loss” for learning with link functions, and establishes non-convexity of empirical squared loss risk with monotone link functions.

Stone-Weierstrass theorem on Wikipedia.
- A powerful generalization of the classical Weierstrass approximation theorem.
K. Hornik, M. Stinchcombe, H. White. Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359-366, 1989.
- Uses Stone-Weierstrass to establish approximation capabilities of two-layer neural networks.
A.R. Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39:930-944.
- Connection between two-layer neural networks and Fourier transforms.
N. Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 68(3):337-404, 1950.
- Correspondence between positive definite kernels and reproducing kernel Hilbert spaces.

Y. Freund, R.E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119-139, 1997.
- Introduces Hedge
Y. Freund, R.E. Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29:79-103, 1999.
- Analysis of Hedge using relative entropy, and proof of von Neumann’s minimax theorem using Hedge.

P. Auer, N. Cesa-Bianchi, Y. Freund, R.E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48–77, 2002.
- Introduces Exp3 algorithm (among many others).
P. Auer, N. Cesa-Bianchi, P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47:235-256, 2002.
- Provides a non-asymptotic analysis of the Upper Confidence Bound algorithm.
D.A. Freedman. On tail probabilities for martingales. Annals of Probability, 3(1):100-118, 1975.
- Tail probability inequalities for martingale difference sequences; ubiquitous in papers about multi-armed bandits and other sequential decision-making problems.

S. Dasgupta, L. Schulman. A probabilistic analysis of EM for mixtures of separated, spherical Gaussian. Journal of Machine Learning Research, 8:203-226, 2007.
- Analyzes EM algorithm (a distance-based algorithm) for learning mixtures of Gaussians.
S. Vempala, G. Wang. A spectral algorithm for learning mixture models. Journal of Computer and System Sciences, 68(4):841-860, 2004.
- Applies PCA to learning mixtures of Gaussians.

F. McSherry. Spectral partitioning of random graphs. Proceedings 42nd IEEE Symposium on Foundations of Computer Science, 2001.
- Spectral clustering approaches to finding planted partitions in random graphs (related to stochastic block models).
N. Alon, M. Krivelevich, B. Sudakov. Finding a large hidden clique in a random graph. Random Structures & Algorithms, 13(3-4):457-466, 1998.
- Spectral algorithm for finding cliques of size \(\Omega(\sqrt{n})\) planted in \(G(n,1/2)\) random graphs.