Efficient average-case population recovery in the presence of insertions and deletions.
F. Ban and X. Chen and R. Servedio and S. Sinha.
23nd International Workshop on Randomness and Computation (RANDOM), 2019, pp. 44:1-44:18.


Abstract:

A number of recent works have considered the \emph{trace reconstruction problem}, in which an unknown source string $x \in \zo^n$ is transmitted through a probabilistic channel which may randomly delete coordinates or insert random bits, resulting in a \emph{trace} of $x$. The goal is to reconstruct the original string~$x$ from independent traces of $x$. While the asymptotically best algorithms known for worst-case strings use $\exp(O(n^{1/3}))$ traces \cite{DOS17,NazarovPeres17}, several highly efficient algorithms are known \cite{PZ17,HPP18} for the \emph{average-case} version of the problem, in which the source string $x$ is chosen uniformly at random from $\zo^n$.

In this paper we consider a generalization of the above-described average-case trace reconstruction problem, which we call \emph{average-case population recovery in the presence of insertions and deletions}. In this problem, rather than a single unknown source string there is an unknown distribution over $s$ unknown source strings $x^1,\dots,x^s \in \zo^n$, and each sample given to the algorithm is independently generated by drawing some $x^i$ from this distribution and outputting an independent trace of $x^i$.

Building on the results of \cite{PZ17} and \cite{HPP18}, we give an efficient algorithm for the average-case population recovery problem in the presence of insertions and deletions. For any support size $1 \leq s \leq \smash{\exp ( \Theta(n^{1/3}) ) }$, for a $1-o(1)$ fraction of all $s$-element support sets $\{x^1,\dots,x^s\} \subset \zo^n$, for every distribution ${\cal D}$ supported on $\{x^1,\dots,x^s\},$ our algorithm can efficiently recover ${\cal D}$ up to total variation distance at most $\eps$ with high probability, given access to independent traces of independent draws from ${\cal D}$ as described above. The running time of our algorithm is $\poly(n,s,1/\eps)$ and its sample complexity is $\poly ( s,1/\eps,\exp(\log^{1/3} n) ).$ This polynomial dependence on the support size $s$ is in sharp contrast with the \emph{worst-case} version of the problem (when $x^1,\dots,x^s$ may be any strings in $\zo^n$), in which the sample complexity of the most efficient known algorithm \cite{BCFSS19} is doubly exponential in $s$.


Back to main papers page