Alexandr Andoni | Advancing the algorithmic foundations of massive data

Alexandr Andoni, Associate Professor of Computer Science
The impact of huge data sets is hard to understate, responsible for advancing almost every scientific and technological field, from machine learning and personalized medicine, to speech recognition and translation.
The flip side of the data revolution is that massive data has rendered many standard algorithms computationally expensive, necessitating a wholesale rethinking of even the most basic computational methods. To take one example: Nearest neighbors search, the classic method since the 70s for finding similarity among objects, is straightforward for small data sets: compare every object to every other one and compute their similarity for each pair. But as the number of objects increases into the billions, computing time grows quadratically, making the task prohibitively expensive, at least in terms of traditional expectations.
Alexandr Andoni, a theoretical computer scientist focused on developing algorithmic foundations for massive data, sees the need to reframe the issue: “The question today is not ‘what can we solve in polynomial time?’ but ‘what is possible in time proportional to data size, or even less?’ With all this data, what is the best we can do given the resources?
Fortunately, vast improvements are possible in both theory and practice once we settle for approximate answers.”
Rather than searching an entire data set for the single most nearest neighbor, a search would go much faster if objects were pre-grouped according to some shared attribute, making it easy to zero in on just the small subset of objects most likely to contain the most similar neighbor. The new challenge then becomes: what attribute shall we use to make such a pre-grouping maximally efficient. The speed-up thus gained reverberates across a wide range of computational methods since nearest neighbors search is ubiquitous and serves as a primitive in higher-level algorithms, particularly in machine learning.
More generally, in the same spirit of relying on (approximate) attributes to speed operations, Andoni has developed a theory of sketching that represents complex objects by smaller, simpler “sketches” that capture the main structure and essential properties of the original objects yet use less (sublinear) space and time to compute. For many tasks, such as estimating similarity of a pair of objects, a sketch may work just as well as a fully realized object. While relaxing strict formulations is happening generally throughout the community in most part by necessity, Andoni is carrying the idea further and is in the forefront of those inventing new primitives and new data structures that explicitly incorporate the concept of sketches.
In early work applying a sketch primitive (Locality Sensitive Hashing) to nearest neighbor search, Andoni in 2006 with Piotr Indyk was able, for the most basic Euclidean distances, to improve over a seminal 1998 algorithm widely used for classification. The Communications of the ACM later (2008, vol. 51) hailed the new primitive as a breakthrough technology that allowed researchers to revisit decades-old problems and solve them faster. Few expected more progress to be possible. Yet when Andoni again revisited the problem (in papers published in 2014 and in 2015), he unexpectedly made more headway, this time by using data-dependent, rather than random, hash functions, a novel idea not previously conceptualized as something useful for improving algorithms.
More seems to be possible by continually re-examining and applying fresh perspectives to classic, well-studied problems. It’s one reason, Andoni is returning to academia after four years at Microsoft Research Silicon Valley, which closed in 2014. One point in Columbia’s favor was the Data Science Institute (DSI) where exceptional researchers from across the university and diverse disciplines study different aspects of data.
Says Kathy McKeown, director of the DSI: “We’re thrilled to have Alex join the Data Science Institute. His research is directly relevant to big data applications where the ability to provide computationally efficient solutions depends on the development of new algorithms.”

Andoni also looks for inspiration from students. “You feel the new energy. Students are excited, and that excitement and enthusiasm is invigorating. It leads you to think about even things you’ve already checked off, to believe there might be new ways of doing things. You want to try again.”

B.S., Computer Science and Engineering and B.S., in Mathematics, Massachusetts Institute of Technology, 2004;
M.S., Engineering in Electrical Engineering and Computer Science, 2005;
Ph.D., Computer Science, Massachusetts Institute of Technology, 2009;
Microsoft Research Silicon Valley, 2010-2014;
Visiting scientist, Simons Institute for the Theory of Computing, 2014


Linda Crane