Readings

Lecture 1: PAC learning via PTFs; learning decision trees.

Good background on some of the concept classes we will be studying in this course can be found in the notes on ``Classes of Boolean Functions'' by Nader Bshouty and Eyal Kushilevitz; a copy can be found here. Relevant material for this lecture is in Sections 1-4.

Background on PAC learning based on low-degree PTFs can be found here (specifically Sections 2 and 3).

Lecture 2: Low-degree PTFs for DNF formulas.

The relation between rank-k decision trees and k-decision lists is due to Avrim Blum and can be found here.

The paper that first showed that any size-s DNF formula is equivalent to some sqrt{n log s log n}-decision list is by Nader Bshouty, "A Subexponential Exact Learning Algorithm for DNF Using Equivalence Queries"; you can get it here.

Lecture 3: The O(n^{1/3}log(s)) PTF degree upper bound for DNF is from here.

The paper of Beigel, Reingold and Spielman that uses rational functions to construct an O(log W) degree PTF for an intersection of two weight-W halfspaces is here (see Section 3).

The paper that translated these results into the learning setting is here (see Section 4).

Lecture 4: Basics of Fourier analysis.

The basics of Fourier analysis over the Boolean cube can be found in many places, including here (especially Sections 1-3) and here (especially Sections 1 and 2); see also the survey of Mansour referenced in Lecture 5 below. Another great source is Ryan O'Donnell's Spring 2007 lecture notes and the serialization of the book he is writing on Fourier analysis of Boolean functions; you can find links to these from his web page.

Lecture 5: The Low-Degree Algorithm and learning DNF in quasipolynomial time.

The Low-Degree Algorithm is from the 1993 paper of Linial, Mansour and Nisan. The material we covered on Fourier concentration of DNF formulas is from a paper of Mansour. (Note that much of this paper is concerned with a different learning model than the random-examples-only model that we're currently discussing; the portion of the paper that's relevant to what we discussed is Lemma 3.2.) A survey of Mansour covering this material (and more) can be found here.

Lecture 6: Fourier concentration for DNF (continued), constant-depth circuits, and monotone functions.

The LMN paper mentioned above also establishes Fourier concentration of constant-depth circuits. The material on Fourier concentration of monotone functions (which we didn't finish) is from a paper of Bshouty and Tamon that is available here.

Lecture 7: Fourier concentration via total influence for monotone Boolean functions(concluded); Gotsman-Linial conjecture; Fourier degree versus max sensitivity.

The paper of Gotsman and Linial that poses the Gotsman-Linial conjecture is here; in addition to the conjecture (in Section 5) there is a lot of other nice material in the paper. The survey of Buhrman and de Wolf on decision tree complexity of Boolean functions is a good source for lots of material; they discuss the "block sensitivity versus sensitivity" question in Section 4.2. (Fourier degree is polynomially related to block sensitivity as explained in their survey.)

Lecture 8: Lower bound for learning monotone functions; Modified Low-Degree Algorithm for learning monotone functions; noise sensitivity and Fourier representations of Boolean functions.

Information-theoretic lower bounds on learning monotone functions can be found in the Bshouty/Tamon paper (though not the exact lower bound we sketched in class). The "Modified Low-Degree Algorithm" (though not described as such) can be found here and here. Information on noise sensitivity can be found here and here.

Lecture 9: From noise sensitivity to Fourier concentration continued; bounding the noise sensitivity of halfspaces; learning LTFs and random DNFs using low-degree Fourier coefficients but not using the Low-Degree Algorithm.

Peres's proof upper bounding the noise sensitivity of halfspaces can be found here. A proof of a more general result for degree-d PTFs, inspired by Gopalan's simplification of Peres's proof for halfspaces, can be found here. A paper showing how a linear threshold function can be learned from its Chow parameters can be found here, and papers on learning random DNF formulas from their low-degree Fourier coefficients can be found here, here and here.

Lecture 10: Learning juntas.

The [MOS04] paper that gives an $n^(0.704r) * poly(n,2^r)$ runtime for learning r-juntas can be found here. The earlier paper of Siegenthaler which contains the main necessary structural result (that correlation-immune Boolean functions have low Fourier degree) can be found here. A preliminary version of the 2012 paper of G. Valiant that achieves an improved $n^(0.61r)*poly(n,2^r)$ runtime for learning $r$-juntas can be found here.

Lecture 11: Learning noisy parity.

The [BKW03] paper that gives the $2^{n/log(n)}$ time for learning parity with noise (for constant noise rates bounded away from 1/2) can be found here.