Steve Nowick to Host the 2019 NY Scientific Data Summit

Columbia University is co-organizing and hosting the 2019 New York Scientific Data Summit, a two-and-a-half day symposium to explore data-driven discovery and innovation in science and industry. The summit is scheduled for June 12-14 in Columbia’s Davis Auditorium,

What Can You Do With a Computer Science Degree?

Research By CS Teams Part of the IEEE Symposium on Security and Privacy 2019

The IEEE Symposium on Security and Privacy (SP19) brings together researchers and practitioners to present and discuss developments in computer security and electronic privacy. Now in its 40th year, the conference issued Test of Time awards, one of which was bestowed to computer science professor Steven Bellovin. Also at SP19 were papers from other CS researchers, detailed below.

Encrypted Key Exchange: Password-Based Protocols Secure Against Dictionary Attacks
Steven M. Bellovin and Michael Merritt
Test of Time award – Proceedings of the IEEE Symposium on Research in Security and Privacy, Oakland, May 1992

This paper created the subfield of cryptography now known as PAKE: Password- Authenticated Key Exchange. A Google Scholar search for cryptography password PAKE gets more than 1,300 hits, and that’s an undercount because the early papers do not use “PAKE”; the coinage is somewhat later.

The world of 1992 was moving towards networks. Rather than having a password for every computer and having to enter it constantly, a user would have to download a cryptographic credential from a server. But how was this credential protected? The answer, of course, is the user’s password.

This means that an attacker who can eavesdrop on the network connection can pick up the credential that is encrypted with the password, and see if a guess at the password yields something that looks like a valid credential. For example, a cryptographic credential often includes a username or an expiration date—if you guess at the password, do you get something that looks like a username or a reasonable expiration date? This is known as “verifiable plaintext”.

The underlying problem relates to so-called “password-guessing attacks”. That is, an attacker guesses at a user’s password and tries to verify that guess. The classic attack of this type was described in a 1979 paper by Morris and Thompson; it assumed that the attackers had somehow obtained the “hashed password file”.

Bellovin and Michael Merritt’s solution eliminated verifiable plaintext. Suppose an attacker eavesdrops on network traffic. The cryptographic credential is encrypted with the user’s password. For an incorrect guess, the attacker gets a string of random bytes. However, a correct guess yields something that “looks” like a string of random bytes, and the only way to tell that it’s not is to solve an extremely hard problem, one that may be computationally unsolvable regardless. This means that there is no way for an attacker to verify a guess by looking at network traffic.

Bellovin no longer works on PAKEs but recommends Matt Green’s post for the state of PAKEs today.

Certified Robustness to Adversarial Examples with Differential Privacy
Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana

The paper proposes a new defense against adversarial examples with rigorous guarantees.

Adversarial examples are an attack against machine learning (ML) models, and especially Deep Neural Networks (DNNs). These models are used in many domains to make predictions from input data (e.g. label the contents of a picture, recognize a spoken word, etc.).

In adversarial attacks, an attacker makes a small change to the input data – such as small changes to the pixels of an image, invisible to the naked eye – that fools the ML model into make a wrong prediction chosen by the adversary. These attacks can be particularly problematic when models are used in security sensitive contexts, such as face recognition to enable access to a phone, voice recognition to operate a device, or to detect malware and credit card fraud.

PixelDP is a defense that adds specifically calibrated noise to DNNs and adapts the training and prediction procedures to make them more difficult to attack with adversarial examples. By leveraging the theory of a privacy mechanism, differential privacy, PixelDP can also prove guarantees about the maximal change an attacker can trigger with small changes to the input data, to give the operator a measure of robustness for each specific prediction.

Moreover, this approach is the first one to give both provable guarantees and scale to the large DNNs and datasets that are used in practice. As an example, this is the first defense with rigorous guarantees that supports large DNN models, such as the one called Inception released pre-trained by Google, on tasks such as large scale image classification on the ImageNet dataset.

NEUZZ: Efficient Fuzzing with Neural Program Smoothing
Dongdong She, Kexin Pei, Dave Epstein, Junfeng Yang, Baishakhi Ray, and Suman Jana

Fuzzing is a popular and simple technique to hunt bugs by generating random test inputs and executing the target program with these inputs to trigger potential security vulnerabilities. Due to its simplicity and low performance overhead, fuzzing has been very successful at finding numerous security vulnerabilities in many real-world programs. However, existing works rely on a large number of error-trial scheme and lack sophisticated guidance to generate high-quality inputs which helps to uncover bugs.

The researchers built the first practical neural-network assisted-fuzzer which finds software vulnerabilities in real world programs. The neural network model approximates program behaviors by functioning as a differentiable program surrogate. Inspired from adversarial examples in deep learning, it computes the gradient of the neural program surrogate and performs efficient gradient-guided mutations.

You May Be Able to Find Invisible Malware, But the Fix Is Hard

Tal Malkin Awarded Columbia University Presidential Teaching Award

Columbia University bestows the award to distinguished faculty for their teaching as recognized by the Columbia community. Tal Malkin will receive the award at this year’s commencement.

“Teaching has its own reward above any award and it is something that is very important to me,” said Malkin, an associate professor who joined Columbia in 2003. “I really enjoy teaching so I am very happy to receive this award.”

Malkin can be found in her office, where throughout the day students drop in to talk about their projects or assignments and ask questions. That is one thing that she likes – students that ask questions. She believes that it is through the process of asking questions and working through answers that students actually learn. Through this dialogue she is able to show them the “beauty of computer science theory” and how the material is interesting in its own right.

But, of course, the real joy is when she sees her students discover and understand new concepts. Computer science theory is one of the classes she teaches and it is required for all CS majors. “It isn’t an easy class so when I see that it finally clicks and a student gets it, I feel a sense of accomplishment and pride,” shared Malkin. One could say she hopes that her students “fall in love with computer science” through her class, as she once did as an undergrad in Israel.

“Tal would always explain concepts both intuitively and present material in a variety of ways to appeal to different types of students,” said Daniel Jaroslawicz (CC ’19), a former student and teaching assistant. Considering that the class had over 200 students with different levels of familiarity with computer science theory, achieving that was no simple feat. He continued, “But what really surprised me was Tal’s successful effort to learn the names of as many of her students as possible.”

Aside from teaching undergrads, PhD students are another group under her purview. In this case it is more of mentoring these researchers. Said Malkin, “My PhD students usually study other parts of cryptography so they often teach me more than I teach them.”

Dana Dachman-Soled, a former PhD student, describes Malkin as unassuming and warm but at the same time very assertive, focused, and challenging. Some of her most fond memories with Malkin took place at the Hungarian Pastry Shop where they worked on research projects along with other collaborators. “While the shop is quite dark and often crowded, we did some of our best research there,” said Dana Dachman-Soled (PhD ‘11), who is now an assistant professor at the University of Maryland, College Park. “She was never dismissive of new ideas, thereby encouraging me and her other students to bring them forward and develop them.”

This openness and yearning to impart knowledge to students is reflected in other professional activities. Malkin also heads the education track of the Columbia-IBM Center for Blockchain and Data Transparency. For that, she helps oversee curriculum development for classes on data privacy and blockchain technologies, among other topics.

“I love teaching and my door is always open for anyone that has a question,” said Malkin. “It’s very invigorating to interact with my students and I hope that I have been able to show them how cool computer science can be.”

Tal Malkin is an associate professor of Computer Science at Columbia University, where she directs theCryptography Lab. She received her Ph.D. in Computer Science from the Massachusetts Institute of Technology in 2000, and joined Columbia after three years as a research scientist in the Secure Systems Research Department atAT&T Labs – Research.

Five CS graduates awarded the Jonathan Gross Prize

The Jonathan Gross Prize honors students who graduate at the top of their class with a track record of promising innovative contributions to computer science. Five graduating computer science students—one senior from each of the four undergraduate schools and one student from the master’s program—were selected to receive the prize, which is named for Jonathan L. Gross, the popular professor emeritus who taught for 47 years at Columbia and cofounded the Computer Science Department. The prize is made possible by an endowment established by Yechiam (Y.Y.) Yemini, who for 31 years until his retirement in 2011, was a professor of computer science at Columbia.

Steven Shao
Master of Science

I feel a deep appreciation for the potential of computers to express and realize complex ideas.

A childhood dream of building games for friends propelled Steven Shao into the world of programming and math. His interest in computer science remained with him when he first came to Columbia for undergrad and his appreciation of it has grown even more.

“Whenever I come across a technical idea in CS, like in the algorithms of ML, systems, or theory, there’s something irresistible about exploring and optimizing them further,” said Shao. “It’s like going down a rabbit hole of puzzles.”

He is inspired by how computers can be more human and how it affects society. Through classes, like computer vision, he learned how artificial intelligence powers the mechanisms underlying “this magic”. Shared Shao, “I wholly enjoyed professor Carl Vondrick’s lectures, starting from the long history of traditional vision and its relationship to optics and graphics, to its recent revolution with deep learning.”

Classes wrapped for Shao in December and he recently started as a software developer at Jane Street in downtown New York City.

Alexandre Lamy
Columbia Engineering valedictorian

There are a few aspects that, in my mind, make computer science a unique and beautiful subject to study.

Alexandre Lamy loves the kind of logical thinking and problem solving approach that is needed in computer science to turn vague thoughts and ideas into precise algorithms. Each class he took showed him the beauty of the field and pushed him.

Machine learning with professor Nakul Verma was the class he liked the most. Said Lamy, “The class simultaneously gives you a deep understanding of the foundations of ML and teaches you most of the advanced mathematics you’ll need in more difficult classes – linear algebra, probability, and statistics concepts that are relevant to CS and not really covered in the corresponding required classes.” Lamy ended up working as a teacher’s assistant three times after taking the class and Verma became an important mentor for the rest of his time at Columbia.

Lamy looks forward to solving even more complex problems and formulating algorithms when he returns to Columbia in the fall for a MS in computer science.

“Looking around our world reveals an incredible number of real life problems that have been solved using computers,” he said. “But the amount of problems that are still out there and waiting to be solved seems to defy the imagination.”

Linnie Jiang
Columbia College

I really appreciate how CS relies on a foundation of math so that results can be explained in a clean logical language that is familiar to us.

Computer science can be applied to so many domains and the interdisciplinarity of CS is what appeals to Jiang. While at Columbia her research with professors from the neuroscience department studied how expectation is encoded in the circuitry of the fruit fly (Drosophila melanogaster). Her computational background contributed to a deeper understanding of how experience-dependent learning is largely a circuit phenomenon driven by the interactions of an ensemble of neurons (aka a graph!).

“I was surprised by how continuous and unending research is—if you answer one question, the next immediately pops up,” said Jiang. “It seems like there are endless implications of every discovery you make.”

Jiang will pursue a PhD in Neurosciences at Stanford University, supported by funding from the Stanford Graduate Fellowship as well as the NSF Graduate Research Fellowship. She also plans to obtain a PhD minor in computer science while at Stanford.

April Tang
Barnard College

Being a computer science major allowed me to combine everything I was interested in into one degree.

April Tang came to Barnard knowing that she would study mathematics or statistics, but an intro to Java class opened her up to computer science. She quickly realized she could combine math and statistics with CS because the structural thinking in object-oriented programing and CS theory aligned perfectly with her training from mathematics.

“I was even more amazed to find that I can use my statistics knowledge in real-world applications through the use of machine learning techniques,” said Tang.

That interest in machine learning led her to a natural language processing class. It is within the context of natural language processing where questions that might seem vague in other general machine learning courses become concrete problems. Tang shared, “Having the context in mind really motivates me and allows me to think in more depth about the pros and cons of each method.”

As a Barnard student, Tang took most of her liberal arts requirements at the school. One class she particularly adored was dance in New York City – where coursework was on analyzing the choreography of different dance performances.

In July, she will join Citigroup as a quantitative analyst in their New York office. Tang hopes to study data science and go back to school for a graduate degree in the future.

Rose Orenbuch
General Studies

When I took my first computer science course I realized that I would prefer an analytic and quantitative approach to biological questions.

Rose Orenbuch aspires to an academic career in the biosciences with a focus on genomics and neuroscience. She credits the skills she has gained from her CS courses as useful in conducting research – the ability to program and quickly learn new languages is particularly useful in bioinformatics, whether for statistical analysis or tool building or debugging other people’s programs.

A class that was both a favorite and the most challenging for her was computational genomics taught by professor Itsik Pe’er. Shared Orenbuch, “It was very helpful to understand the algorithms behind the tools I had been using in my research prior to the course.”

For that class, she built a tool for typing human leukocyte antigens (HLAs). HLAs are cell-surface proteins that are responsible for the regulation of the immune system. The tool has since been published and a paper describing it is currently under review at Bioinformatics.

During the summer, she will continue research to develop a tool to detect HLA expression imbalance and investigate its effect on outcomes of immunotherapy. Orenbuch is set to attend Harvard University for a PhD in systems biology in the fall.

CS major Alexandre Lamy named Columbia Engineering 2019 Valedictorian

As valedictorian of the Class of 2019, Alexandre Lamy will receive Columbia Engineering’s highest academic honor. Lamy will also be awarded the School’s Illig Medal and speak at Engineering Class Day on May 20.

Computer-Human Interaction Papers Delve into Livestreaming, Visual Design, and Creative Writing

CS researchers headed to the 2019 ACM CHI Conference on Human Factors in Computing Systems in Glasgow, Scotland. The conference gathers researchers and scientists to examine the ways that technology can be used to improve people and society.

This year’s papers feature women-led research – assistant professor Lydia Chilton presented three projects, including one with Katy Gero, a PhD student. While undergrad Lauren Arnett took part in a poster presentation for work that studied livestreaming.

Below are brief descriptions and links to the research.

VisiBlends: A Flexible Workflow for Visual Blends
Lydia B. Chilton Columbia University, Savvas Petridis Columbia University, Maneesh Agrawala Stanford University

Visual blends are an advanced graphic design technique to draw attention to a message. They combine two objects in a way that is novel and useful in conveying a message symbolically. This paper presents VisiBlends, a flexible workflow for creating visual blends that follows the iterative design process. The tool introduces a design pattern for blending symbols based on principles of human visual object recognition.

“To the average person, it seems that a visual blend requires creative inspiration—an aha! moment—and that there is no exact formula to make one,” said Chilton, in an interview. “We wanted to deconstruct the process of building visual blends and see if there was a way we could make it more accessible to people by coupling the human element with computational methods.”

An illustration of how VisiBlends creates a visual blend for “Starbucks is here for summer.” People brainstorm symbols for Starbucks and summer. The computer automatically combines them based on shape. People judge the outputs, and tell the computer how to improve the image based on color, shape, or details.

Visiblends allows users to collaboratively generate visual blends with steps involving brainstorming, synthesis, and iteration. An evaluation of the workflow shows that decentralized groups can generate blends in independent microtasks, co-located groups can collaboratively make visual blends for their own messages, and VisiBlends improves novices’ ability to make visual blends.

Cross-platform Interactions and Popularity in the Livestreaming Community
Lauren Arnett Columbia University, Robert Netzorg Columbia University, Augustin Chaintreau Columbia University, Eugene Wu Columbia University

The paper explores how livestreamers on the platform, Twitch, may want to use other social media accounts to develop their audience.

Twitch is a social media platform owned by Amazon that focuses on livestreaming for video gamers. With opportunities for receiving donations and paid subscriptions, the most popular Twitch content creators can earn up to $500,000 per month.

Twitch users have social incentives to become popular, as passionate, tight-knit communities form in support of individual successful streamers. A common piece of advice for Twitch users looking to gain popularity is to have a social media presence and use it to promote themselves. The researchers looked into the validity of this claim to see if social media presence was in fact associated with popularity, to confirm if that is an effective way of promotion for streamers.

With data from over 17,000 streamers both directly from Twitch and scraped from social media platforms, different behaviors on social media platforms were examined to see if they correlated with popularity. For example, a Twitch user who develops a social media following before they started streaming may be able to transfer their popularity over to their Twitch profile. However, when compared with Twitch users who started a social media account after their first stream, the researchers were surprised to find that based on two measures of popularity, users who started Twitch with an existing social media presence did not experience a significant advantage over those who do not.

“Throughout this project, I’ve been amazed by the variety of people who stream and the lengths they go to for their audiences,” said Lauren Arnett, a computer science student. The livestreaming community extends all over the world and unites over particular games or streamers with a ton of enthusiasm. Streamers themselves spend the hours of an overtime work week playing for the entertainment of their viewers, creating a complex livestreaming schedule that will satisfy followers in any time zone. She continued, “although Twitch is a relatively unknown platform, the commitment and connections that exist on it run deep.”

Metaphoria an Algorithmic Companion for Metaphor Creation
Katy Ilonka Gero Columbia University and Lydia B. Chilton Columbia University

While computation has opened a floodgate of creative tools for music and the visual arts (like synthesizers, Photoshop, ChucK, SnapChat filters, and GANS) little of that fervor has transferred to the medium of text. Word processors that detect spelling and grammatical errors are useful, but do not support the creative elements of writing.

In this work, the researchers built a metaphor generation tool to help describe abstract concepts like ‘love’ or ‘consciousness’. Metaphors convey abstract ideas by comparing them to more concrete or familiar concepts, and are used in all forms of writing from poetry to journalism.

“Metaphoria is an online tool that generates metaphorical connections between two words,” said Katy Gero, a second year PhD student. “It is designed to give you ideas for poetry, essays, or stories. It augments your abilities, rather than replacing them.”

The tool can generate suggestions that make sense to the writer, given what the writer is looking to do. It can also make suggestions that inspire writers to come up with new and different ideas. The researchers found that Metaphoria almost always produces some connections that make sense, though it’s pretty dependent on the writer as to which connections make sense.

“Metaphoria rarely resulted in people writing very similar things; instead, Metaphoria inspires people to write things they might never have thought of,” said Gero.

Month: May 2019