## Papers from the Theory Group Accepted to SODA ’21

Six papers from CS researchers were accepted to the ACM-SIAM Symposium on Discrete Algorithms (SODA ’21). The conference focuses on efficient algorithms and data structures for discrete problems.

Approximate Nearest Neighbors Beyond Space Partitions

Alexandr Andoni *Columbia University*, Aleksandar Nikolov *University of Toronto*, Ilya Razenshteyn *Microsoft Research,* Erik Waingarten *Columbia University*

The paper studies the approximate nearest neighbor problem in high-dimensional spaces. Namely, the algorithmic task is to produce a data structure for answering queries of the following form: what is the (approximately) closest point to a query q within a dataset P. This problem is a fundamental task in modern data analysis, and the paper gives new and improved approximations for one of the most commonly studied metric spaces, the ℓp spaces, as well as generalized versions of the Hamming metric.

The surprising aspect of this work is the data-dependent decomposition schemes for high dimensional vectors; while LSH partitions are well-known for ℓp spaces when p ∈ [1, 2], these fail for higher values of p. The present work shows how to build decompositions that are not LSH, but nevertheless solve the approximate nearest neighbor problem efficiently.

Graph Spanners by Sketching in Dynamic Streams and the Simultaneous Communication Model

Arnold Filtser *Columbia University*, Michael Kapralov *École polytechnique fédérale de Lausanne*, Navid Nouri *École polytechnique fédérale de Lausanne*

In the graph dynamic stream model, a set of n vertices is given in advance, and a stream of edge insertions and deletions is observed by a player, which can use only a very small memory to store his impressions of the stream. Once the stream is exhausted, the player is required to answer a query (known in advance) regarding the observed graph. Algorithms in this model usually use linear sketching techniques. Most notably, Ahn, Guha, and McGregor ’12 showed how to compute a spanning tree using \tilde{O}(n) space, while Kapralov et al 14, devised an algorithm computing spectral sparsifier.

The question of computing a spanner, or even more generally shortest path distance estimation is poorly understood in this model. Previous multi-pass algorithms (which are allowed to observe the stream several times) were devised, however, no single-pass algorithm was known. This paper provides the first single-pass algorithm that uses \tilde{O}(n) space while returning \tilde{O}(n^{\frac23}) estimation of all distances. Even though this distortion is very large, the authors conjecture that it is close to optimal.

Static and Streaming Data Structures for Fréchet Distance Queries

Arnold Filtser *Columbia University*, Omrit Filtser *Stony Brook University*

The Fréchet distance between two curves P and Q is often described by the man-dog analogy, in which a man is walking along P, holding a leash connected to its dog who walks along Q, and the goal is to minimize the length of the leash that allows them to fully traverse their curves without backtracking. The Fréchet distance is well studied and has numerous applications. Given two curves with n points, a simple dynamic programming could be used to compute the Fréchet distance between them in O(n^2) time. However, under standard complexity assumptions (SETH), there is no strongly subquadratic algorithm computing the Fréchet distance, even if the solution may be approximated up to a factor of 3.

To overcome this quadratic barrier, this paper studies the question of distance oracle. Here a curve P is preprocessed in advance, such that given a query curve Q, the Fréchet distance between P and Q could be approximated up to 1+\epsilon factor in linear time. The authors constructed a distance oracle with an optimal tradeoff between approximation factor, storage space, and query time.

Surprisingly, when the length of the curve P is extensively large and its points can be observed only once in a streaming fashion, the authors constructed a distance oracle with the exact same parameters.

On Multi-Dimensional Gains from Trade Maximization

Yang Cai *Yale University*, Kira Goldner *Columbia Univeristy*, Steven Ma *Yale University*, Mingfei Zhao *Yale University*

Think of a two-sided market with a bunch of “sellers,” such as Etsy sellers, Airbnb hosts, or employees, and a bunch of “buyers,” such as Etsy customers, Airbnb renters, or employers. A platform sits in the middle matching buyers and sellers, such as Airbnb. In order to maximize the platform’s value to the market’s participants, the gains from trade should be maximized — or the increased value in the market due to the matches.

Maximizing gains from trade using an algorithm that (1) aligns participants incentives, (2) ensures participants don’t regret participating, and (3) does not require the platform to lose money is known to be provably impossible even for one buyer, one seller, and one item. Further, as with questions of revenue maximization, the complexity suffers drastically as soon as a buyer is interested in more than one item.

This paper investigates the setting where a buyer is interested in many different items, each owned by a different seller, and gives the first guarantee for gains from trade in this setting. It provides an O(log n)-approximation to the optimal gains from trade subject to properties (1-3) using a combination of simple mechanisms–fixed posted pricings, buyer offering mechanisms, and a new “seller-adjusted posted price” mechanism which is surprisingly capable of earning far more gains from trade and the others in some instances.

Polynomial-Time Trace Reconstruction in the Smoothed Complexity Model

Xi Chen *Columbia University*, Anindya De *University of Pennsylvania*, Chin Ho Lee *Columbia University*, Rocco A. Servedio *Columbia University*, Sandip Sinha *Columbia University*

In the trace reconstruction problem, an unknown source string is sent through a probabilistic deletion channel which independently deletes each bit with a certain deletion rate and concatenates the surviving bits, yielding a trace of the unknown string. The problem is to reconstruct the unknown string given access to its independent traces.

The main result of the paper is a polynomial-time algorithm for the trace reconstruction problem in the smoothed analysis model, where any “worst-case” string is perturbed and the goal is to reconstruct its perturbed version with high probability.

The researchers’ approach is based on reconstructing a string from the multiset of its short subwords and is quite different from previous algorithms for either the worst-case or average-case versions of the problem. The heart of the work is a new efficient procedure for reconstructing the multiset of all O(log n)-length subwords of any source string using its traces.

Random Restrictions of High-Dimensional Distributions and Uniformity Testing with Subcube Conditioning

Clement Canonne *IBM Research*, Xi Chen *Columbia University*, Gautam Kamath *University of Waterloo*, Amit Levi *University of Waterloo*, Erik Waingarten *Columbia University*

The paper gives a nearly-optimal algorithm for testing the uniformity of distributions supported on hypercubes under the subcube conditioning model, where one can draw samples from the unknown distribution after fixing a subset of variables. The key technical component is a natural notion of random restrictions for distributions over hypercubes, and a quantitative analysis of how such a restriction affects the mean vector of the distribution. Along the way, the researchers also considered the problem of mean testing with independent samples and provide a nearly-optimal algorithm.