L.-Y. Tan and J. Thaler and R. Servedio.

We study the challenging problem of learning decision lists and halfspaces attribute-efficiently, giving both positive and negative results.

Our main positive result is a new tradeoff between the running time and mistake bound for learning length-k decision lists over n Boolean variables. When the allowed running time is relatively high, our new mistake bound improves significantly on the mistake bound of the best previous algorithm of [KS06].

Our main negative result is a new lower bound on the weight of any degree-$d$ polynomial threshold function (PTF) that computes a particular decision list over k variables (the ODD-MAX-BIT function). The main result of [Beigel94] is a weight lower bound of 2^{\Omega(k/d^2)}, which was shown to be essentially optimal for d \leq k^{1/3} by [KS06]. Here we prove a 2^{\Omega(\sqrt{k/d})} lower bound, which improves on Beigel's lower bound for d > k^{1/3}. This lower bound establishes strong limitations on the effectiveness of the [KS06] approach and suggests that it may be difficult to improve on our positive result. The main tool used in our lower bound is a new variant of Markov's classical inequality which may be of independent interest; it provides a bound on the derivative of a univariate polynomial in terms of both its degree and the size of its coefficients.

Link to open-access JMLR Workshop & Conference Proceedings version