Agnostically Learning Halfspaces

Agnostically Learning Halfspaces.
A. Kalai and A. Klivans and Y. Mansour and R. Servedio.
SIAM Journal on Computing 37(6), 2008, pp. 1777--1805.
Preliminary version in 46th Symposium on Foundations of Computer Science (FOCS) , 2005, pp. 11--20.

Abstract:

We consider the problem of learning a halfspace in the agnostic framework of Kearns {\em et al.} \cite{KSS:94}, where a learner is given access to a distribution on labelled examples but the labelling may be arbitrary. The learner's goal is to output a hypothesis which performs almost as well as the optimal halfspace with respect to future draws from this distribution. Although the agnostic learning framework does not explicitly deal with noise, it is closely related to learning in worst-case noise models such as malicious noise.

We give the first polynomial-time algorithm for agnostically learning halfspaces with respect to several distributions, such as the uniform distribution over the $n$-dimensional Boolean cube $\bits^n$ or unit sphere in $\Reals^n,$ as well as any log-concave distribution in $\Reals^n$. Given any constant additive factor $\eps > 0$, our algorithm runs in poly$(n)$ time and constructs a hypothesis whose error rate is within an additive $\eps$ of the optimal halfspace. We also show this algorithm agnostically learns Boolean disjunctions in time $2^{\tilde{O}(\sqrt{n})}$ with respect to any distribution; this is the first subexponential-time algorithm for this problem. Finally, we obtain a new algorithm for PAC learning halfspaces under the uniform distribution on the unit sphere which can tolerate the highest level of malicious noise of any algorithm to date.

Our main tool is a new algorithm which finds a polynomial that best fits a set of points with respect to a particular metric. We show that, in fact, this algorithm is an arbitrary-distribution generalization of the well known ``low-degree'' Fourier algorithm of Linial, Mansour, \& Nisan \cite{LMN:93} and has excellent noise tolerance properties when minimizing with respect to the $L_{1}$ norm. We apply this algorithm in conjunction with a non-standard Fourier transform (which does not use the traditional parity basis) for learning halfspaces over the uniform distribution on the unit sphere; we believe this technique is of independent interest.

pdf of conference version

pdf of journal version

Back to main papers page