Learning mixtures of product distributions over discrete domains.
J. Feldman and R. O'Donnell and R. Servedio.
SIAM Journal on Computing, 37(5), 2008, pp. 1536--1564.
Preliminary version in 46th Symposium on Foundations of Computer Science (FOCS) , 2005, pp. 501--510.


We consider the problem of learning {\em mixtures of product distributions over discrete domains} in the distribution learning framework introduced by Kearns et al.~\cite{KMR+:94}. We give a $\poly(n/\eps)$ time algorithm for learning a mixture of $k$ arbitrary product distributions over the $n$-dimensional Boolean cube $\{0,1\}^n$ to accuracy $\eps$, for any constant $k$. Previous polynomial time algorithms could only achieve this for $k = 2$ product distributions; our result answers an open question stated independently in~\cite{Cryan:99} and~\cite{FreundMansour:99}. We further give evidence that no polynomial time algorithm can succeed when $k$ is superconstant, by reduction from a notorious open problem in PAC learning. Finally, we generalize our $\poly(n/\eps)$ time algorithm to learn any mixture of $k = O(1)$ product distributions over $\{0,1, \dots, b\}^n$, for any $b = O(1)$.

pdf of conference version

pdf of journal version

Back to main papers page