Readings

Lecture 1: Good background on some of the concept classes we will be studying in this course can be found in the notes on ``Classes of Boolean Functions'' that are available from Nader Bshouty's web page here. For the first two-thirds of the course, only the material in Sections 1-4 will be relevant.

First unit: on-line mistake bound learning.

Lecture 2: A nice overview of the online mistake bound learning model can be found in section 3 of Avrim Blum's survey paper ``Online algorithms in Machine Learning'' that is available here. Among other things, this paper describes the online algorithm for learning decision lists that we'll be using as our first main algorithmic tool (to learn decision trees in quasipolynomial time and DNF formulas in subexponential time).

The paper that first showed that any size-s decision tree is equivalent to some log(s)-decision list is by Avrim Blum, "Rank-r Decision Trees are a Subclass of r-Decision Lists," Information Processing Letters 42:183--185, 1992. You can get the paper online here by accessing "Information Processing Letters" online through the Columbia library system.

The paper that first showed that any size-s DNF formula is equivalent to some sqrt{n log s log n}-decision list is by Nader Bshouty, "A Subexponential Exact Learning Algorithm for DNF Using Equivalence Queries"; you can get it here.

Lecture 3: The Perceptron Convergence Theorem can be found in many sources, including the book by Minsky and Papert referenced below.

The original paper on the Winnow algorithm is available here.

The paper of Maass and Turan giving a poly(n) time mistake bound algorithm for learning arbitrary linear threshold functions over {0,1}^n is in ``Computational Learning Theory and Natural Learning Systems: Volume I: Constraints and Prospects'' edited by Hanson, Drastal and Rivest; the book is in the engineering library.

The lower bounds on PTF degree we discussed can be found in Beigel's survey paper ``The polynomial method in circuit complexity'' (section 6 on signs of polynomials over the integers); you can get the paper here.

The original source for the PTF degree lower bound for PARITY is Minsky and Papert's book ``Perceptrons: an introduction to computational geometry"; this contains nice descriptions of the Perceptron algorithm and Perceptron Convergence Theorem as well. It's available at the library. The log(s) upper and lower bound for PTF degree of size-s decision trees is folklore.

Lecture 4: The Minsky/Papert book is also the original source for the Omega(n^{1/3}) lower bound on the PTF degree of the DNF we covered in class.

The n^{1/3}log s upper bound on PTF degree for any s-term DNF formula is from Klivans/Servedio "Learning DNF in Time 2^{\tilde{O}(n^{1/3}}"; you can find it here.

Here is a paper with an overview of known results on PTF degree bounds and their applications in learning theory, but no proofs. (The paper talks about the PAC learning model but you can substitute the phrase ``online learning model'' everywhere.)

Lecture 5: The upper bounds on PTF degree for low-weight intersections of halfspaces is essentially in Section 3 of the paper ``PP is Closed under Intersection'' by Beigel, Reingold and Spielman. You can find the paper here. The connections with efficient learnability are from Klivans, O'Donnell, Servedio ``Learning intersections and thresholds of halfspaces'' (section 4) which you can find here.

The lower bounds on PTF degree of an AND of two majorities is from O'Donnell, Servedio ``New degree bounds for polynomial threshold functions'' (section 5); you can find it here.


Second unit: uniform distribution PAC learning.

Lecture 6: A good source for the basics of PAC learning is the textbook by Kearns and Vazirani ``An introduction to computational learning theory''; for our purposes Chapter 1, Section 4.2, and Appendix 9 are useful.

For a different approach to learning DNF under uniform in quasipolynomial time (based on greedy set cover rather than Winnow), see Verbeurgt's COLT (Conference on Computational Learning Theory) 1988 paper (you can find COLT 1988 in the Kosoresow memorial library on the second floor of the CS department).

Lecture 7: A very useful source for basic Fourier analysis over the Boolean cube as it is used in learning theory is the survey by Mansour; you can get it here.

Another good survey that contains some more recent results (as well as a nice presentation of the older material) is a recent paper by Kobler and Lindner; you can find it here.

The low-degree algorithm is originally from Linial, Mansour and Nisan's 1993 JACM paper on ``Constant depth circuits, Fourier transform and learnability" (available through the ACM digital library using any Columbia machine). Mansour's survey gives a good exposition.

Lecture 8: The Fourier concentration bounds for constant-depth circuits presented in today's lecture are from Linial, Mansour and Nisan's 1993 JACM paper on ``Constant depth circuits, Fourier transform and learnability". See also Mansour's survey article.

Lecture 9: The material on noise sensitivity and the resulting Fourier concentration bounds for halfspaces and functions of k halfspaces can be found in Klivans/O'Donnell/Servedio ``Learning intersections and thresholds of halfspaces'' (section 3).

Lecture 10: The material on influences of variables in monotone Boolean functions and the resulting 2^{\tilde{O}(sqrt{n}/\eps} time algorithm for learning monotone Boolean functions can be found in Bshouty/Tamon's 1996 JACM paper "On the Fourier spectrum of monotone functions"; this is available from the ACM digital library from any Columbia machine.

A poly(n,s^{1/\eps^2}) time algorithm for learning monotone size-s decision trees from uniform random examples only can be found in O'Donnell/Servedio's 2006 paper "Learning monotone decision trees in polynomial time"; you can find it here.

Lecture 11: The KM algorithm for learning decision trees in MQ+uniform learning model (and more generally, learning functions with bounded L_1 norm) is described in Mansour's survey; you can also find the original KM paper here.

The Harmonic Sieve algorithm for learning DNF in polynomial time in the MQ+uniform model. This algorithm is due to Jeff Jackson; you can find the paper here. -->