Roxana Geambasu and Daniel Hsu Chosen for Google Faculty Research Awards Program

The award is given to faculty at top universities to support research that is relevant to Google’s products and services. The program is structured as a gift that funds a year of study for a graduate student.


Certified Robustness to Adversarial Examples with Differential Privacy
Principal Investigator: Roxana Geambasu Computer Science Department

The proposal builds on Geambasu’s recent work on providing a “guaranteed” level of robustness for machine learning models against attackers that may try to fool their predictions.  PixelDP works by randomizing the prediction of a model in such a way to obtain a bound on the maximum change an attacker can make on the probability of any label with only a small change in the image (measured in some norm). 

Imagine that a building rolls out a face recognition-based authorization system.  People are automatically recognized as they approach the door and are let into the building if they are labeled as someone with access to that building. 

The face recognition system is most likely backed by a machine learning model, such as a deep neural network. These models have been shown to be extremely vulnerable to “adversarial examples,” where an adversary finds a very small change in their appearance that causes the models to classify them incorrectly – wearing a specific kind of hat or makeup can cause a face recognition model to misclassify even if the model would have been able to correctly classify without these “distractions.”

The bound the researchers enforce is then used to assess, on each prediction on an image, whether any attack up to a given norm size could have changed the prediction on that image. If it cannot, then the prediction is deemed “certifiably robust” against attacks up to that size. 


A sample PixelDP DNN: original architecture in blue; the changes introduced by PixelDP in red.

This robustness certificate for an individual prediction is the key piece of functionality that their defense provides, and it can serve two purposes.  First, a building authentication system can use it to decide whether a prediction is sufficiently robust to rely on the face recognition model to make an automated decision, or whether additional authentication is required. If the face recognition model cannot certify a particular person, that person may be required to use their key to get into the building.  Second, a model designer can use robustness certificates for predictions on a test set to assess a lower bound of their model on accuracy under attack.  They can use this certified accuracy to compare model designs and choose one that is most robust for deployment.

“Our defense is currently the most scalable defense that provides a formal guarantee of robustness to adversarial example attacks,” said Roxana Geambasu, principal investigator of the research project.

The project is joint work with Mathias Lecuyer, Daniel Hsu, and Suman Jana. It will develop new training and prediction algorithms for PixelDP models to increase certified accuracy for both small and large attacks. The Google funds will support PhD student Mathias Lecuyer, the primary author of PixelDP, to develop these directions and evaluate them on large networks in diverse domains.


The role of over-parameterization in solving non-convex problems
Principal Investigators: Daniel Hsu Computer Science Department, Arian Maleki Department of Statistics

One of the central computational tasks in data science is that of fitting statistical models to large and complex data sets. These models allow for people to reason and draw conclusions from the data.

For example, such models have been used to discover communities in social network data and to uncover human ancestry structure from genetic data. In order to make accurate inferences, it has to be ensured that the model is well-fit to the data. This is a challenge because the predominant approach to fitting models to data requires solving complex optimization problems that are computationally intractable in the worst case.

“Our research considers a surprising way to alleviate the computational burden, which is to ‘over-parameterize’ the statistical model,” said Daniel Hsu, one of the principal investigators. “By over-parameterization, we mean introducing additional ‘parameters’ to the statistical model that are unnecessary from the statistical point-of-view.”

The figure shows the landscape of the optimization problem for fitting the Gaussian mixture model. 

One way to over-parameterize a model is to take some some prior information about the data and now regard it as a variable parameter to fit. For instance, in the social network case, the sizes of the communities expected to discover may have been known; the model can be over-parameterized by treating the sizes as parameters to be estimated. This over-parameterization would seem to make the model fitting task more difficult. However, the researchers proved that, for a particular statistical model called a Gaussian mixture model, over-parameterization can be computationally beneficial in a very strong sense.

This result is important because it suggests a way around computational intractability that data scientists may face in their work of fitting models to data.

The aim of the proposed research project is to understand this computational benefit of over-parameterization in the context of other statistical models. The researchers have empirical evidence of this benefit for many other variants of the Gaussian mixture model beyond the one for which their theorem applies. The Google funds will support PhD student Ji Xu, who is jointly advised by Daniel Hsu and Arian Maleki.



New App Can Secure All Your Saved Emails

Columbia Engineering researchers develop Easy Email Encryption, an app that encrypts all saved emails to prevent hacks and leaks, is easy to install and use, and works with popular email services such as Gmail, Yahoo, etc.

CS Researchers Create Technology That Helps Developers with App Performance Testing

Younghoon Jeon and Junfeng Yang

The application performance testing company NimbleDroid was recently acquired by mobile performance platform HeadSpin. Co-founded by professor Junfeng Yang and PhD student Younghoon Jeon, the tool is used to detect bugs and performance bottlenecks during the development and testing phase of mobile apps and websites.

“I’ve always liked programming but hated the manual debugging process,” said Junfeng Yang, who joined the department in 2008. “So I thought it would be good to use artificial intelligence and program analysis to automate the task of debugging.”

NimbleDroid scans apps for bugs and bottlenecks and sends a report back with a list of issues. The New York Times wrote about how they used it to identify bottleneck issues with the start up time of their android app and speed it up by four times. Pinterest also used the tool during a testing phase and was able to resolve issues within 21 hours. Previously they would hear about problems from users once the app was already released, and spent “multiple days” to identify and fix the problems.

NimbleDroid has a premier list of customers, some of which also use HeadSpin.  These common customers connected Yang and HeadSpin’s CEO Manish Lachwani. They thought that NimbleDroid would be a great addition to HeadSpin’s suite of mobile testing and performance solutions. The acquisition was recently announced with Yang named as Chief Scientist of HeadSpin.

Because the initial technology for NimbleDroid started at Columbia, Yang and Jeon worked with Columbia Technology Ventures (CTV) to license the technology. They started a company, got the exclusive license for the technology, and further developed the tool. CTV is the technology transfer office that facilitates the transition of inventions from academic research labs to the market.

“We are excited that our research is widely used by unicorn and Fortune 1000 companies and helps over a billion users,” said Yang. “But our work is not done, we are developing more technologies to make it easy to launch better software faster.”

Columbia Welcomes Rebecca Wright as Inaugural Director of Barnard’s CS Program

Columbia’s computer science community is growing with Barnard College’s creation of a program in Computer Science (CS). Rebecca Wright has been hired as the director of Barnard’s CS program and as the director of the Vagelos Computational Science Center (Vagelos CSC), both of which are located in the Milstein Center.

Wright will lay down the groundwork to establish a computer science department to better serve the Barnard community. According to Wright, the goals of Barnard’s CS program are to bring computing education in a meaningful way to all Barnard students, to better integrate Barnard’s CS majors into the Barnard community, and to build a national presence for Barnard in computing research and education. Barnard students have already been able to take CS classes at Columbia and to major in CS by completing the Columbia CS major requirements.  The Barnard program will continue to collaborate closely with the Columbia CS department, seeking to add opportunities rather than duplicating existing efforts or changing existing requirements. 

“Initial course offerings are expected to focus on how CS interacts with other disciplines, such as social science, lab science, arts, and the humanities,” said Wright, who comes to Columbia from Rutgers University. “We will address the different ways it can interact with various disciplines and ways to advance those disciplines, but with a focus on how to advance computer science to meet the needs of those disciplines.”

Wright sees room to create more opportunities for students to see the full spectrum of computer science – from the one end of the spectrum using the computer as a tool, to the other end of the spectrum where there is the ability to design new algorithms, to implement new systems, to carry out things at the forefront of computer science. Barnard will enable students to find more places along that spectrum to become fluent in the underlying tools and mechanisms and be able to reason about them, create them, and combine them in new ways.

The first course will be taught by Wright and offered next year in the fall. It is currently being developed and will most likely fall under her research interests – security, privacy, and cryptography. She also is working on building the faculty through both tenure-stream professors and a new teaching and research fellows program.

For now, students can continue to visit Barnard’s CSC and CS facilities on the fifth floor of the Milstein Center, including making use of the Computer Science and Math Help Room for guidance from tutors, studying or relaxing in the CSC social space, and enrolling in CSC workshops.

Wright encourages students to visit the Milstein Center,”I love walking through the library up to our offices.” The space is open and a modern presentation of a library – much like how she envisions how the computer science program will develop.

“Computing has an impact on advances in virtually every field today,” said Wright. “I am excited to see what we develop around these multidisciplinary interactions and interpretations of computing.”

Finding Ways To Study Cricket

Professor Vishal Misra is an avid fan of cricket and now works on research that looks at the vast amount of data on the sport.

“I live in two worlds – one where I am a computer science professor and the other where I am ‘the cricket guy’,” said Vishal Misra, who has been with the department since 2001 and recently received the Distinguished Alumnus of IIT Bombay award.

For the most part, Misra has kept these two worlds separate until last year when he worked on research with colleagues at MIT that forecasts the evolution or progress of the score of a cricket match.

When a game is affected by rain and is cut short, there is a statistical system in place – the Duckworth-Lewis-Stern System which either resets the target or declares the winner if no more play is possible. Their analysis showed that the current method is biased and they developed a better method based on the same ideas that are used to predict the evolution of the game. Their algorithm looks at data of past games and the current game and uses the theory of robust synthetic control to come up with a prediction that is surprisingly accurate.

The first time Misra became involved in the techie side of cricket was through CricInfo, the go-to website for anything to do with the sport. (It is now owned by ESPN.)

Misra (in the middle) on the cricket field, 2014.

In the early 90s, during the internet’s infancy, fans would “meet” and chat in IRC (internet relay chat) chat rooms to talk about the sport. This was a godsend for Misra who had moved to the United States from India for graduate studies at the University of Massachusetts Amherst. Cricket was (and still is) not that popular here but imagine living in 1993 and not be able to hop onto a computer or a smartphone to find out the latest scores? He would call home or go to a bookstore in Boston to buy Indian sports magazines like Sportstar and India Today.

Through the #cricket chatrooms, he met CricInfo’s founder Simon King and they developed the first website with the help of other volunteers spread across the globe. ­ Misra shared, “It was a site by the fans for the fans, that was always the priority.”  They also launched live scorecards and game coverage of the 1996 world championships. Misra wrote about the experience for CricInfo’s 20th anniversary. He stuck with his PhD studies and remained in the US when CricInfo became a proper business and opened an office in England.

“I did a lot of coding back then but my first computer science class was the one I taught here in Columbia,” said Misra, who studied electrical engineering for his undergraduate and graduate degrees and joined the department as an assistant professor.

For his PhD thesis, he developed a stochastic differential equation model for TCP, the protocol that carries almost all of the internet’s data traffic. Some of the work he did with colleagues to create a congestion control mechanism based on that model has become part of the internet standard and runs on every cable modem in the world. Cisco took the basic mechanism that they developed, adapted it, and pushed it for standardization. “That gives me a big kick,” said Misra. “That algorithm is actually running almost everywhere.”

Since then his research focus has been on networking and now includes work on internet economics. Richard Ma, a former PhD student who is now faculty at National University Singapore, introduced him to this area. They studied network neutrality issues very early on which led to his playing an active part in the net neutrality debate in India, working with the government, regulators, and citizen activists. “India now has the strongest pro-consumer regulations anywhere in the world, which mirrors the definition I proposed of network neutrality,” he said.

For now, he continues research on net neutrality and differential pricing. He is also working on data center networking research with Google, where he is a visiting scientist. Another paper that generalizes the theory of synthetic control and applies the generalized theory to cricket is in the works. The new paper makes a fundamental contribution to the theory of synthetic control and as a fun application, they used it to study cricket.

“While I continue my work in networking, I am really excited about the applications of generalized synthetic control. It is a tool that is going to become incredibly important in all aspects of society,” said Misra. “It can be used in applications from studying the impact of a legislation or policy to algorithmic changes in some system – to predicting cricket scores!”