Alexandr Andoni | Advancing the algorithmic foundations of massive data
The impact of huge data sets is hard to understate, responsible for advancing almost every scientific and technological field, from machine learning and personalized medicine, to speech recognition and translation.
The flip side of the data revolution is that massive data has rendered many standard algorithms computationally expensive, necessitating a wholesale rethinking of even the most basic computational methods. To take one example: Nearest neighbors search, the classic method since the 70s for finding similarity among objects, is straightforward for small data sets: compare every object to every other one and compute their similarity for each pair. But as the number of objects increases into the billions, computing time grows quadratically, making the task prohibitively expensive, at least in terms of traditional expectations.
Alexandr Andoni, a theoretical computer scientist focused on developing algorithmic foundations for massive data, sees the need to reframe the issue: “The question today is not ‘what can we solve in polynomial time?’ but ‘what is possible in time proportional to data size, or even less?’ With all this data, what is the best we can do given the resources?
Fortunately, vast improvements are possible in both theory and practice once we settle for approximate answers.”
Rather than searching an entire data set for the single most nearest neighbor, a search would go much faster if objects were pre-grouped according to some shared attribute, making it easy to zero in on just the small subset of objects most likely to contain the most similar neighbor. The new challenge then becomes: what attribute shall we use to make such a pre-grouping maximally efficient. The speed-up thus gained reverberates across a wide range of computational methods since nearest neighbors search is ubiquitous and serves as a primitive in higher-level algorithms, particularly in machine learning.
More generally, in the same spirit of relying on (approximate) attributes to speed operations, Andoni has developed a theory of sketching that represents complex objects by smaller, simpler “sketches” that capture the main structure and essential properties of the original objects yet use less (sublinear) space and time to compute. For many tasks, such as estimating similarity of a pair of objects, a sketch may work just as well as a fully realized object. While relaxing strict formulations is happening generally throughout the community in most part by necessity, Andoni is carrying the idea further and is in the forefront of those inventing new primitives and new data structures that explicitly incorporate the concept of sketches.
In early work applying a sketch primitive (Locality Sensitive Hashing) to nearest neighbor search, Andoni in 2006 with Piotr Indyk was able, for the most basic Euclidean distances, to improve over a seminal 1998 algorithm widely used for classification. The Communications of the ACM later (2008, vol. 51) hailed the new primitive as a breakthrough technology that allowed researchers to revisit decades-old problems and solve them faster. Few expected more progress to be possible. Yet when Andoni again revisited the problem (in papers published in 2014 and in 2015), he unexpectedly made more headway, this time by using data-dependent, rather than random, hash functions, a novel idea not previously conceptualized as something useful for improving algorithms.
More seems to be possible by continually re-examining and applying fresh perspectives to classic, well-studied problems. It’s one reason, Andoni is returning to academia after four years at Microsoft Research Silicon Valley, which closed in 2014. One point in Columbia’s favor was the Data Science Institute (DSI) where exceptional researchers from across the university and diverse disciplines study different aspects of data.
Says Kathy McKeown, director of the DSI: “We’re thrilled to have Alex join the Data Science Institute. His research is directly relevant to big data applications where the ability to provide computationally efficient solutions depends on the development of new algorithms.”
Andoni also looks for inspiration from students. “You feel the new energy. Students are excited, and that excitement and enthusiasm is invigorating. It leads you to think about even things you’ve already checked off, to believe there might be new ways of doing things. You want to try again.”
B.S., Computer Science and Engineering and B.S., in Mathematics, Massachusetts Institute of Technology, 2004; M.S., Engineering in Electrical Engineering and Computer Science, 2005; Ph.D., Computer Science, Massachusetts Institute of Technology, 2009; Microsoft Research Silicon Valley, 2010-2014; Visiting scientist, Simons Institute for the Theory of Computing, 2014
– Linda Crane
The Columbia Engineering community has come together to combat the coronavirus pandemic on multiple fronts. In close collabo-ration with the Columbia University Irving Medical Center, we’re leveraging our expertise and innovation to address short term medical needs and long term societal impacts.
Dean Boyce's statement on amicus brief filed by President Bollinger
President Bollinger announced that Columbia University along with many other academic institutions (sixteen, including all Ivy League universities) filed an amicus brief in the U.S. District Court for the Eastern District of New York challenging the Executive Order regarding immigrants from seven designated countries and refugees. Among other things, the brief asserts that “safety and security concerns can be addressed in a manner that is consistent with the values America has always stood for, including the free flow of ideas and people across borders and the welcoming of immigrants to our universities.”
This recent action provides a moment for us to collectively reflect on our community within Columbia Engineering and the importance of our commitment to maintaining an open and welcoming community for all students, faculty, researchers and administrative staff. As a School of Engineering and Applied Science, we are fortunate to attract students and faculty from diverse backgrounds, from across the country, and from around the world. It is a great benefit to be able to gather engineers and scientists of so many different perspectives and talents – all with a commitment to learning, a focus on pushing the frontiers of knowledge and discovery, and with a passion for translating our work to impact humanity.
I am proud of our community, and wish to take this opportunity to reinforce our collective commitment to maintaining an open and collegial environment. We are fortunate to have the privilege to learn from one another, and to study, work, and live together in such a dynamic and vibrant place as Columbia.
Mary C. Boyce
Dean of Engineering
Morris A. and Alma Schapiro Professor