Four papers were accepted to the Foundations of Computer Science (FOCS) symposium. CS researchers worked alongside colleagues from various organizations to develop the algorithms.
Authors: Anindya De Northwestern University, Philip M. Long Google, Rocco Servedio Columbia University
The paper is about a new algorithm for learning an unknown probability distribution given draws from the distribution.
A simple example of the problem that the paper considers can be illustrated with a penny tossing scenario: Suppose you have a huge jar of pennies, each of which may have a different probability of coming up heads. If you toss all the pennies in the jar, you’ll get some number of heads; how many times do you need to toss all the pennies before you can build a highly accurate model of how many pennies are likely to come up heads each time? Previous work answered this question, giving a highly efficient algorithm to learn this kind of probability distribution.
The current work studies a more difficult scenario, where there can be several different kinds of coins in the jar — for example, it may contain pennies, nickels and quarters. Each time you toss all the coins, you are told the total *value* of the coins that came up heads (but not how many of each type of coin came up heads).
“Something that surprised me is that when we go from only one kind of coin to two kinds of coins, the problem doesn’t get any harder,” said Rocco Servedio, a researcher from Columbia University. “There are algorithms which are basically just as efficient to solve the two-coin problem as the one-coin problem. But we proved that when we go from two to three kinds of coins, the problem provably gets much harder to solve.”
Authors: Alexandr Andoni Columbia University, Assaf Naor Princeton University, Aleksandar Nikolov University of Toronto, Ilya Razenshteyn
Microsoft Research Redmond, Erik Waingarten Columbia University
This paper gives new algorithms for the approximate near neighbor (ANN) search problem for general normed spaces. The problem is a classic problem in computational geometry, and a way to model “similarity search”.
For example, Spotify may need to preprocess their dataset of songs so that new users may find songs which are most similar to their favorite songs. While a lot of work goes into designing good algorithms for ANN, these algorithm work for specific metric spaces of interest to measure the distance between two points (such as Euclidean or Manhattan distance). This work is the first to give non-trivial algorithms for general normed spaces.
“One thing which surprised me was that even though the embedding appears weaker than established theorems that are commonly used, such as John’s theorem, the embedding is efficiently computable and gives more control over certain parameters,” said Erik Waingarten, an algorithms and computational complexity PhD student.
Authors: Alexandr Andoni Columbia University, Zhao Song Harvard University & UT-Austin, Clifford Stein Columbia University, Zhengyu Wang Harvard University, Peilin Zhong Columbia University
This paper is about designing fast algorithms for problems on graphs in parallel systems, such as MapReduce. The latter systems have been widely-successful in practice, and invite designing new algorithms for these parallel systems. While many classic “parallel algorithms” were been designed in the 1980s and 1990s (PRAM algorithms), predating the modern massive computing clusters, they had a different parallelism in mind, and hence do not take full advantage of the new systems.
The typical example is the problem of checking connectivity in a graph: given N nodes together with some connecting edges (e.g., a friendship graph or road network), check whether there’s a path between two given nodes. The classic PRAM algorithm solves this in “parallel time” that is logarithmic in N. While already much better than the sequential-time of N (or more), the researchers considered to question whether one can do even better in the new parallel systems a-la MapReduce.
While obtaining a much better runtime seems out of reach at the moment (some conjecture impossible), the researchers realized that they may be able to obtain faster algorithms when the graphs have a small diameter, for example if any two connected nodes have a path at most 10 hops. Their algorithm obtains a parallel time that is logarithmic in the diameter. (Note that the diameter of a graph is often much smaller than N.)
“Checking connectivity may be a basic problem, but its resolution is fundamental to understanding many other problems of graphs, such as shortest paths, clustering, and others,” said Alexandr Andoni, one of the authors. “We are currently exploring these extensions.”
Authors: Marshall Ball Columbia University, Dana Dachman-Soled University of Maryland, Siyao Guo Northeastern University, Tal Malkin Columbia University, Li-Yang Tan Stanford University
With this paper, the researchers constructed new non-malleable codes that improve efficiency over what was previously known.
“With non-malleable codes, any attempt to tamper with the encoding will do nothing and what an attacker can only hope to do is replace the information with something completely independent,” said Maynard Marshall Ball, a fourth year PhD student. “That said, non-malleable codes cannot exist for arbitrary attackers.”
The constructions were derived via a novel application of a powerful circuit lower bound technique (pseudorandom switching lemmas) to non-malleability. While non-malleability against circuit classes implies strong lower bounds for those same classes, it is not clear that the converse is true in general. This work shows that certain techniques for proving strong circuit lower bounds are indeed strong enough to yield non-malleability.