Readings


Lecture 1: Good background on some of the concept classes we will be studying in this course can be found in the notes on ``Classes of Boolean Functions'' that are available from Nader Bshouty's web page here. For the first two-thirds of the course, only the material in Sections 1-4 will be relevant.

First unit: on-line mistake bound learning.

Lecture 2: A nice overview of the online mistake bound learning model can be found in section 3 of Avrim Blum's survey paper ``Online algorithms in Machine Learning'' that is available here. Among other things, this paper describes the online algorithm for learning decision lists that we'll be using as our first main algorithmic tool (to learn decision trees in quasipolynomial time and DNF formulas in subexponential time).

The paper that first showed that any size-s decision tree is equivalent to some log(s)-decision list is by Avrim Blum, "Rank-r Decision Trees are a Subclass of r-Decision Lists," Information Processing Letters 42:183--185, 1992. You can get the paper online here by accessing "Information Processing Letters" online through the Columbia library system.

The paper that first showed that any size-s DNF formula is equivalent to some sqrt{n log s log n}-decision list is by Nader Bshouty, "A Subexponential Exact Learning Algorithm for DNF Using Equivalence Queries"; you can get it here.

Lecture 3: A description of the Perceptron algorithm and proof of the Perceptron Convergence Theorem will be passed out in class.

The original paper on the Winnow algorithm is available here.

The paper of Maass and Turan giving a poly(n) time mistake bound algorithm for learning arbitrary linear threshold functions over {0,1}^n is in ``Computational Learning Theory and Natural Learning Systems: Volume I: Constraints and Prospects'' edited by Hanson, Drastal and Rivest; the book is in the engineering library.

The lower bounds on PTF degree we discussed can be found in Beigel's survey paper ``The polynomial method in circuit complexity'' (section 6 on signs of polynomials over the integers); you can get the paper here.

Another good source for this lecture is Minsky & Papert's book ``Perceptrons: an introduction to computational geometry" -- this contains nice descriptions of the Perceptron algorithm and Perceptron Convergence Theorem, and the original proofs of the PTF lower bounds. You can get it at the library.

Lecture 4: Again, see the Beigel paper and Minsky/Papert book for the lower bounds on PTF degree that we did in class.

The log(s) upper and lower bound for PTF degree of size-s decision trees is folklore.

The n^{1/3}log s upper bound on PTF degree for any s-term DNF formula is from Klivans/Servedio "Learning DNF in Time 2^{\tilde{O}(n^{1/3}}"; you can find it here.

Lecture 5: The upper bounds on PTF degree for low-weight intersections of halfspaces is essentially in Section 3 of the paper ``PP is Closed under Intersection'' by Beigel, Reingold and Spielman. You can find the paper here. The connections with efficient learnability are from Klivans, O'Donnell, Servedio ``Learning intersections and thresholds of halfspaces'' (section 4) which you can find here.

The lower bounds on PTF degree of an AND of two majorities is from O'Donnell, Servedio ``New degree bounds for polynomial threshold functions'' (section 5); you can find it here.


Second unit: uniform distribution PAC learning.

Lecture 6: A good source for the basics of PAC learning is the textbook by Kearns and Vazirani ``An introduction to computational learning theory''; for our purposes Chapter 1, Section 4.2, and Appendix 9 are useful.

Lecture 7: For a different approach to learning DNF under uniform in quasipolynomial time (based on greedy set cover rather than Winnow), see Verbeurgt's COLT (Conference on Computational Learning Theory) 1988 paper (you can find COLT 1988 in the Kosoresow memorial library on the second floor of the CS department).

A very useful source for uniform distribution learning results based on Fourier techniques is the survey by Mansour; you can get it here.

Lecture 8: The low-degree algorithm is originally from Linial, Mansour and Nisan's 1993 JACM paper on ``Constant depth circuits, Fourier transform and learnability"; a good exposition is given in Mansour's survey.

The material on noise sensitivity and the resulting Fourier concentration bounds for halfspaces and functions of k halfspaces can be found in the Klivans/O'Donnell/Servedio ``Learning intersections and thresholds of halfspaces'' (section 3).

Lecture 9: The Fourier concentration bounds for constant-depth circuits presented in today's lecture are from Linial, Mansour and Nisan's 1993 JACM paper on ``Constant depth circuits, Fourier transform and learnability". See also Mansour's survey article.

Lecture 10: Finish LMN theorem, beginning of uniform+MQ learning model. Learning juntas in the MQ+uniform model.

Lecture 11: The KM algorithm for learning decision trees in MQ+uniform learning model (and more generally, learning functions with bounded L_1 norm). The KM algorithm is described in Mansour's survey; you can also find the original KM paper here.

Lecture 12: The Harmonic Sieve algorithm for learning DNF in polynomial time in the MQ+uniform model. This algorithm is due to Jeff Jackson; you can find the paper here.