Baishakhi Ray Named VMware Early Career Faculty Award Recipient

Assistant professor Baishakhi Ray has won a VMware Early Career Faculty Award to develop machine learning tools that will improve software security. The grant program recognizes the next generation of exceptional faculty members. The gift is made in support of early-career faculty’s research and to promote excellence in teaching.

In today’s world, almost every aspect of our lives is controlled by software. Unfortunately, most software tends to be buggy, often threatening even the most safety- and security-critical software. According to a recent report, 50% of software developers’ valuable time is wasted at finding and fixing bugs costing the global economy around USD$1.1 trillion in 2016 alone.  

“The goal of my research is to address this problem and figure out how to automatically detect and fix bugs to improve software robustness, for both traditional and machine learning-based software,” said Ray, who joined the department in 2018. 

In particular, her research will address two main challenges of software robustness: (i) traditional software have numerous implicit and explicit specifications; it is often difficult to know all of them in advance. (ii) With the advent of machine learning-based systems (e.g., self-driving cars), explicitly providing such specifications involving natural inputs, like image and text, is very hard. 

Ray’s plan is a two-pronged approach. First, she and her team will build novel machine learning models to learn implicit specifications/rules from traditional programs and leverage these rules to detect and fix bugs automatically. However, such techniques are not easily extendable to machine learning-based systems as they follow different software paradigms (e.g., finite state machine vs. neural network). To improve the robustness of such systems, they will also devise new analysis techniques to examine the internal states of the models for potential violations. 

“A successful outcome of this project will produce new techniques to detect and remediate software errors and vulnerabilities with increased accuracy to make software more secure,” said Ray. 

Baishakhi Ray Receives 2019 IBM Faculty Award

IBM has selected assistant professor Baishakhi Ray for an IBM Faculty Award. The highly selective award is given to professors in leading universities worldwide to foster collaboration with IBM researchers. Ray will use the funds to continue research on artificial intelligence-driven program analysis to understand software robustness. 

Although much research has been done, there are still countless vulnerabilities that make system robustness brittle. Hidden vulnerabilities are discovered all the time – either through a system hack or monitoring system’s functionalities. Ray is working to automatically detect system weaknesses using artificial intelligence (AI) with her project, “Improving code representation for enhanced deep learning to detect and remediate security vulnerabilities”.

One of the major challenges in AI-based security vulnerability detection is finding the best source code representation that can distinguish between vulnerable versus benign code. Such representation can further be used as an input in supervised learning settings for automatic vulnerability detection and fixes. Ray is tackling this problem by building new machine-learning models for source code and applying machine learning techniques such as code embeddings. This approach could open new ways of encoding source code into feature vectors. 

“It will provide new ways to make systems secure,” said Ray, who joined the department in 2018. “The goal is to reduce the hours of manual effort spent in automatically detecting vulnerabilities and fixing them.”

A successful outcome of this project will produce a new technique to encode source code with associated trained models that will be able to detect and remediate a software vulnerability with increased accuracy.

IBM researchers Jim Laredo and Alessandro Morari will collaborate closely with Ray and her team on opportunities around design, implementation, and evaluation of this research.

Four papers from Columbia at Crypto 2018

CS researchers worked with collaborators from numerous universities to explore and develop cryptographic tools. The papers were presented at the 38th Annual International Cryptology Conference.

Correcting Subverted Random Oracles
Authors: Alexander Russell University of Connecticut, Qiang Tang New Jersey Institute of Technology, Moti Yung Columbia University, Hong-Sheng Zhou Virginia Commonwealth University

Many cryptographic operations rely on cryptographic hash functions. The structure-less hash functions behave like what is called a random oracle (random looking public function). The random oracle methodology has proven to be a powerful tool for designing and reasoning about cryptographic schemes, and can often act as an effective bridge between theory and practice.

These functions are used to build blockchains for cryptocurrencies like Bitcoin, or to perform “password hashing” – where a system stores a hashed version of the user’s password to compare against the value that is presented by the user: it is hashed, and compared to the  previously stored computed value in order to identify the user. These functions garble the input to look random, and it is hard from the output to find the input or any other input that maps to the same output result.

The paper is in the area of implementations of cryptographic functions that are subverted (aka implemented incorrectly or maliciously). Shared one of the researchers, Moti Yung from Columbia University,” We deal with a hash function that some entries in the function table are corrupted by the opponent – a bad guy who implemented it differently than the specification. The number of corruption is relatively small to the size of the function. What we explored is what can be done with it by the bad guys and can we protect against such bad guys.”

First, the paper shows devastating attacks are possible with this modification of a function: the bad guys can control the blockchain, or allow an impersonator to log in on behalf of a user, respectively, in the above applications.

Then, to counter the attack the researchers demonstrated how to manipulate the function (apply it a few times and combine the results in different ways) and how the new function prevents attacks.

In this paper, the focus is on the basic problem of correcting faulty—or adversarially corrupted—random oracles, so that they can be confidently applied for such cryptographic purposes.

“The construction is not too complex,” said Yung, a computer science professor. “The proof that it works, on the other hand, required the development of  new techniques that show that the manipulation produces a hash function that behaves like a random oracle in spite of the manipulation by the bad guy.”

Authors: Lucas Kowalczyk Columbia University, Tal Malkin Columbia University, Jonathan Ullman Northeastern University, Daniel Wichs Northeastern University

How can one know if the way they are using data is protecting the privacy of the individuals to whom the data belongs? What does it mean to protect an individual’s privacy exactly? Differential privacy is a mathematical definition of privacy that one can objectively prove one’s usage satisfies. Informally, if data is used in a differentially private manner, then any one individual’s data cannot have significant influence on the output behavior.

For example, Apple uses differentially private data collection to collect information about its users to improve their experience, while now being able to additionally give them a mathematical guarantee about how their information is being used.

Differential privacy is a nice notion, but there is a question: is it always possible to use data in a differentially private manner? “Our paper is an impossibility result,” said Lucas Kowalczyk, a PhD student at Columbia University. “We exhibit a setting, an example of data and desired output behavior, for which we prove that differential privacy is not possible to achieve. The specific setting is constructed using tools from cryptography.”

Proofs of Work From Worst-Case Assumptions
Authors: Marshall Ball Columbia University, Alon Rosen IDC, Herzliya, Manuel Sabin UC Berkeley, Prashant Nalini Vasudevan MIT

A long standing goal in the area is constructing NP-Hard cryptography. This would involve proving a “win-win” theorem of the following form: either some cryptographic scheme is secure or the satisfiability problem can be solved efficiently. The advantage of a theorem like this, is that if the scheme is broken it will yield efficient algorithms for a huge number of interesting problems.

Unfortunately, there hasn’t been much progress beyond uncovering certain technical barriers. In the paper the researchers proved win-wins of the following form: either the cryptographic scheme is secure or there are faster (worst-case) algorithms than are currently known for satisfiability and other problems (violating certain conjectures in complexity theory).

“In particular, the cryptography we construct is something called a Proof of Work,” said Maynard Marshall Ball, a fourth year PhD student. “Roughly, a Proof of Work can be seen as a collection of puzzles such that a random one requires a certain amount of computational “work” to solve, and solutions can be checked with much less work.”

These objects appear to be inherent to cryptocurrencies, but perhaps a simpler and older application comes from spam prevention. To prevent spam, an email server can simply require a proof of work in order to receive an email. In this way, it is easy to solve a single puzzle and send a single email, but solving many puzzles to send many emails is infeasible.

A Simple Obfuscation Scheme for Pattern-Matching with Wildcards
Authors: Allison Bishop Columbia University and IEX, Lucas Kowalczyk Columbia University, Tal Malkin Columbia University, Valerio Pastro Columbia University and Yale University, Mariana Raykova Yale University, Kevin Shi Columbia University

Suppose a software patch is released which addresses a vulnerability triggered by some bad inputs. The question is how to publicly release the patch so that all users can apply it, without revealing which bad inputs trigger the vulnerability – to ensure that an attacker will not take advantage of the vulnerability on unpatched systems. 

This can be addressed by what cryptographers call “program obfuscation” — producing a program that anyone can use, but no one can figure out the internal working of, or any secrets used by the program.  Obfuscation has many important applications, but it is difficult to achieve, and in many cases was shown to be impossible. For cases where obfuscation is possible, existing solutions typically utilize complex and inefficient techniques and rely on cryptographic assumptions that are not well understood.

The researchers discovered a surprisingly simple and efficient construction that can obfuscate programs for pattern-matching with wildcards, motivated for example by the software patching application above.  What this means is that a program can be released to check for any given input whether it matches some specific pattern (e.g., the second bit is 1, the third bit is 0, the sixth bit is 0, etc).  

“Crucially, our construction allows you to publicly release such a program, while keeping the actual pattern secret,” said Tal Malkin, a computer science professor, “So people can use the program and information is kept safe.”