Statistical learning
- L.G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, 1984
- Original PAC learning paper
- A. Blumer, A. Ehrenfeucht, D. Haussler, M.K. Warmuth. Occam’s razor. Information Processing Letters, 24:377-380, 1987.
- Finite hypothesis classes / bit complexity
Empirical process theory
- V.N. Vapnik, A.Ya. Červonenkis. On uniform convergence of the frequencies of events to their probabilities. Teoriya Veroyatnostei i ee Primeneniya, 16(2):264-279, 1971.
- A. Blumer, A. Ehrenfeucht, D. Haussler, M.K. Warmuth. Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36(4):929-965, 1989.
- Another paper about VC dimension
- P.L. Bartlett, S. Mendelson. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results. Journal of Machine Learning Research, 3:463-482, 2002.
- Relates Rademacher complexity, Gaussian complexity, and uniform convergence
- R. Meir, T. Zhang. Generalization error bounds for Bayesian mixture algorithms. Journal of Machine Learning Research, 4:839-860, 2003.
- Rademacher-based analysis of study Bayesian mixture algorithms; Lipschitz contraction property for Rademacher averages without absolute value; convex duality-based bounds on Rademacher complexity.
Boosting and margins
- Y. Freund. Boosting a weak learning algorithm by majority. Information and Computation, 121(2):256-285, 1995.
- Introduces Boost-by-Majority
- Y. Freund, R.E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119-139, 1997.
- R.E. Schapire, Y. Freund, P. Bartlett and W.S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5):1651-1686, 1998.
- Margin bound for weighted majority classifiers
- A.R. Klivans, R.A. Servedio. Boosting and hard-core set construction. Machine Learning, 51(3):217-238, 2003.
- Connection to hardness amplification
- R.E. Schapire. Drifting games. Machine Learning, 43(3):265-291, 2001.
- Introduces drifting games, with analysis of Boost-by-Majority
Convex surrogate losses
- P. Long, R. Servedio. Random classification noise defeats all convex potential boosters. Machine Learning, 78(3):287-304, 2010.
- Failure of boosting with convex losses when labels are noisy
- T. Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization. The Annals of Statistics, 32:56-85, 2004
- Develops theory of classification calibration with surrogate losses
- P.L. Bartlett, M.I. Jordan, J.D. McAuliffe. Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101(473):138-156, 2006.
- Develops theory of classification calibration with surrogate losses
- P. Auer, M. Herbster, M.K. Warmuth. Exponentially Many Local Minima for Single Neurons. Advances in Neural Information Processing Systems 8:315–322, 1995.
- Introduces the “matching loss” for learning with link functions, and establishes non-convexity of empirical squared loss risk with monotone link functions.
Online learning with experts
Multi-armed bandits
- P. Auer, N. Cesa-Bianchi, Y. Freund, R.E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48–77, 2002.
- Introduces Exp3 algorithm (among many others).
- P. Auer, N. Cesa-Bianchi, P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47:235-256, 2002.
- Provides a non-asymptotic analysis of the Upper Confidence Bound algorithm.
- D.A. Freedman. On tail probabilities for martingales. Annals of Probability, 3(1):100-118, 1975.
- Tail probability inequalities for martingale difference sequences; ubiquitous in papers about multi-armed bandits and other sequential decision-making problems.
Spectral clustering
- F. McSherry. Spectral partitioning of random graphs. Proceedings 42nd IEEE Symposium on Foundations of Computer Science, 2001.
- Spectral clustering approaches to finding planted partitions in random graphs (related to stochastic block models).
- N. Alon, M. Krivelevich, B. Sudakov. Finding a large hidden clique in a random graph. Random Structures & Algorithms, 13(3-4):457-466, 1998.
- Spectral algorithm for finding cliques of size \(\Omega(\sqrt{n})\) planted in \(G(n,1/2)\) random graphs.