Jonathan Gross Prize recognizes top CS graduates

The inaugural recipients of the Jonathan Gross Prize at the reception for Computer Science graduates. Mitchell Gouzenko, Gaurav Gite, Rudra Gopinath, Jonathan Gross, Yechiam Yemini, Lanting He, Filipe de Avila Belbute Peres (photo credit: Clarissa C. Pena)

The Jonathan Gross Prize is being awarded this year for the first time to honor students who graduate at the top of their class with a track record of promising innovative contributions to computer science. Five graduating computer science students—one senior from each of the four undergraduate schools and one student from the master’s program—were selected to receive the prize, which is named for Jonathan L. Gross, the popular professor emeritus who taught for 47 years at Columbia and cofounded the Computer Science Department. The prize is made possible by an endowment established by Yechiam (Y.Y.) Yemini, who for 31 years until his retirement in 2011, was a professor of computer science at Columbia.

 

Filipe de Avila Belbute Peres

Filipe de Avila Belbute Peres, Columbia College

We are fortunate to live in a time when machines are powerful enough to let us study intelligence by trying to reproduce it.”

Fascinated by computers and computer science from a young age, Filipe de Avila Belbute Peres started building his own PCs and swapping out parts while still a kid; he was programming by his teens. Yet when it came time for college, he applied, and was accepted, to medical school in his native Brazil. The choice reflected his developing interest in studying the biological aspects of intelligence. Upon realizing that he could tackle the problems of neuroscience through a computational/algorithmic approach rather than a biological one, he applied to study computer science in the US. “I found a way to join my different interests, and study the relationships between machine intelligence and human intelligence. That’s where my interest in artificial intelligence comes from.”

This summer he will work at Facebook AI Research, applying neural networks to natural language problems. In the fall, he begins his PhD program in Computer Science at Carnegie Mellon, where he will continue studying artificial intelligence and machine learning with an emphasis in neural nets and deep learning.

 

Mitchell Gouzenko

Mitchell Gouzenko, School of Engineering and Applied Science

“My first criteria when picking a major was to choose something that I could do for hours on end without getting tired of it.”

Majoring in computer science wasn’t Mitchell Gouzenko’s first choice, or even second. He came to Columbia fully intending to pursue Financial Engineering before becoming convinced that he wanted to study physics; he declared, however, for Applied Math. “I was all over the place,” he says now. In his sophomore year, his computer science classes started growing on him. “While taking Operating Systems with professor Jae Woo Lee, I finally realized how much I enjoy programming and knew it would be a huge mistake not to major in computer science.” Because programming was fun—employability concerns were secondary—he changed his major to computer science. He hasn’t looked back. His main interests are in operating systems and more generally, software architecture. “I love reasoning about large codebases and analyzing how the various components interface with each other, with attention to security, performance, and clarity of design.”

This September he starts work at Google where he easily sees himself working for the next five years. A PhD may someday also be in store, but it will first require narrowing his interests.

 

Rudra Gopinath

Rudra Gopinath, General Studies

“The advantage of a computer science education is in seeing patterns the human eye can’t see because there is too much data.”

For Rudra Gopinath, that data will be financial in nature as he prepares to take a position at Citigroup in New York City. By his own admission, it is an interesting choice for someone most excited by theoretical computer science. Taking classes with professors Mihalis Yannakakis and Xi Chen, he was “that student always asking questions in class and always showing up for office hours.” He gravitated particularly to computational complexity—and not because it offered answers. “We don’t know anything about anything in this field. We don’t even know what we know, as opposed to knowing what we don’t know. If I eventually come back to school, this is the field I still study.” But for now there is a major diversion as he moves to apply the hard skill of computer science to business and the financial markets, two fields that have also long interested him. “All the banks I interviewed said the same thing: we can teach you the finance part. What we need from you is what Columbia teaches—the algorithms, the tools to work with data.” Getting to Columbia was not a straight path. He did his freshman and sophomore years in Hong Kong before enrolling in a dual-degree program that allowed him to transfer to Columbia. “Columbia is one of the few Ivy Leagues where you can come in with that sort of flexibility and plug in seamlessly. The professors, the administration, individual people, were highly supportive. I came in and started taking computer science. It was almost as if I had been here all along.”

 

Lanting He, Barnard College

Lanting He

“A wisely designed algorithm can be succinct but unbelievably powerful and quick.”

Lanting He arrived on campus expecting to major in business or economics. Shut out of an economics class in the second semester of her first year, she opted instead for Intro to Computer Science. Never having previously done any coding, she was amazed she could quickly write a simple program to do something practical in real life. It was after taking Data Structures, where she learned to love the logic behind the design of data structures, that she decided to major in computer science. “I find algorithms and data structures to be the most fascinating things to learn, and brainstorming algorithms is the thing I enjoy most.”

This summer, she will do an internship as a strategist analyst at Goldman Sachs. In the fall, she returns to Columbia to pursue a master’s degree at Columbia in Operations Research, exploring in depth how computer science connects with other interdisciplinary studies.

Gaurav Gite

Gaurav Gite, Master’s student, School of Engineering and Applied Science

“I wish to develop computer systems to process, analyze and generate natural languages.”

Gaurav Gite’s interests lie in areas of natural language processing. After graduating from the Indian Institute of Technology (IIT), his desire to conduct research in the domain of artificial intelligence led him to pursue his master’s at Columbia University. He previously interned at Google, working with a research team to optimize the performance of neural networks. This helped in predicting the next words on keyboard inputs for Android devices. For his thesis, he worked with Kathleen McKeown in the Natural Language Processing (NLP) lab on the summarization of news articles, proposing new techniques based on the clustering of topics presented in the news article and salience of sentences. He also performed research at Columbia’s Center for Computational Learning System (CCLS) where he worked on automated methods to assess the content of written text and to grade student writing. Wise Crowd Content Assessment and Educational Rubrics, a research paper he coauthored while at the CCLS was published in the International Journal of Artificial Intelligence in Education in 2016.

After graduation, Gite will start working at Google, where he will continue working in the field of natural language processing.

 

Posted 5/17/2017

 

Query logs put to new uses in data analysis: democratizing interfaces and tracking sources of errors


Researchers led by Eugene Wu are presenting two papers next week in Chicago at SIGMOD, a leading international forum for database research. “Precision Interfaces” proposes a semi-automatic method for generating customized interfaces for real user tasks by analyzing logs of programs that are actually used for analysis. “QFix: Diagnosing Errors through Query Histories” is a data cleaning method that determines how errors entered a database, making it possible to diagnosis the reasons for the errors, uncover other possible errors, and help prevent their reoccurrence. Though very different applications, both papers rely on a unique and underutilized source of information: program and query logs.

 

An interface for every task

Eugene Wu

Data analysis is critical today to an increasing number of tasks. Modern data exploration and analysis tools (think visualization systems like Tableau, or form-based querying) make data analysis easier by providing interactive interfaces complete with buttons, sliders, checkboxes, and other widgets, rather than typing programs manually. Existing interfaces are designed for general tasks performed by many, but they often ignore the long tail of tasks and analyses that are less common but important for small number of people; for these long tail tasks, users either learn how to build interfaces for themselves, rely on existing tools ill-suited for the task, or make do without.

Precision Interfaces,” which is being presented at the Human In the Loop Data Analysis (HILDA) workshop, is intended to address this long tail by automatically generating data analysis interfaces that are “precise” to user tasks. The aim, according to Eugene Wu and his coauthors Haoci Zhang and Thibault Sellam, is to make data analysis accessible to more users so they can do exactly the data operations they want, and on the data they want.

The question of course is figuring out what people want to do when they themselves may have trouble articulating what they want. Operations on data constitute a potentially huge space, but query and program logs serve to constrain the possibilities by capturing what subsets of data that users are most interested in, what tasks they perform most often, and the sequence of those tasks.

In most cases, these logs show that people use data in fairly predictable and limited ways. Says Wu, “If you look at what people are doing it is not totally fancy new things. Any given task is usually associated with a small set of operations. The hard part identifying these task by perusing the history of what everyone has done, and providing interface that matches these tasks.”

Whether users are entering a range of numeric values (where a slider is appropriate), or simply toggling an option (when a checkbox suffices), can be gleaned from analyzing the program logs. By modeling each program (an SQL query or Python code that performs a task) as a node in an interaction graph, and drawing edges between nodes that differ in simple structural changes, researchers can detect similar changes and generate interfaces to express them.

Mapping two programs that differ in the region value to an interface component.

Aesthetics and ease of use are considerations also. Says Sellam, “An interface with only a text box (a console) is visually simple but requires users to basically program. Oppositely, an interface with one button for each program is too complicated for any non-trivial program log. How easily users can interact with the widgets matters too. Ultimately, the optimal design depends on the use case.”

Information in program logs helps constrain otherwise ill-defined, open ended problems. A similar observation is applied by Wu and coauthors Xiaolan Wang and Assistant Professor Alexandra Meliou (University of Massachusetts, Amherst) in analyzing database query logs to identify and repair errors in databases and data-driven applications.

Tracking the source of data errors

Data errors pose a huge cost to business and other entities, especially local governments. Transposing two digits in a numeric value—an income level, a gas meter reading, a phone number—can set in motion a slew of problems as customers or clients get taxed or billed for the wrong amount.  One error—an incorrect tax rate, for example—begets others as later queries retrieve the erroneous rate to calculate taxes for perhaps thousands of people, thus propagating the original error throughout records in the database while obscuring the source of the original error.

The usual fix—manually intervening to change individual errors after (and if) they are discovered—is ad hoc and superficial, and it fails to address the root cause.

QFix is designed to understand how an error entered and impacted the database so it can suggest repairs (e.g., fix the incorrect tax rate). Once a problem is reported, QFix examines the query log to analyze past queries that modified the database and then isolates the query that most likely introduced the initial error. The naive approach would be roll back to a given query, change it, and re-run the rest of the queries and see if it fixed the errors. This approach will not likely to finish within a lifetime for any realistic database.

The researchers’ approach is to think of each query as a set of linear equations that compute the database’s new values from the current values. By turning the queries in the query log into a large set of equations, they can be solved using modern linear constraint solvers. Says Wu, “If your database is a single value, say 10, and you know the incorrect linear equation (query) changed its value to 100 instead of 200, then we know how the linear equation (query) should be changed to fix the error.” QFix simply does this for millions of intertwined equations.

To generate the repairs quickly, the paper describes a set of optimizations to filter out queries, database records, and attributes that are unrelated to the data errors. This lets QFix identify erroneous queries in less than a second on standard industry database benchmark applications. QFix then suggests these queries and possible repairs to an administrator, who repair those queries and propagate the repairs to erase downstream errors—errors that might otherwise go unnoticed.

In using the information contained in query logs, QFix offers a fundamentally different method of data cleaning than anomaly/outlier detection, dimensionality reduction, and other traditional methods.

Both QFix and Precision Interfaces demonstrate how program and query logs are a rich resource for critical data science applications such as data cleaning, creating customized interfaces, and understanding in general what users want and need in their data analysis applications.

Posted 5/11/2017

Top students in computer science receive awards

In addition to awards given in previous years, the Jonathan Gross Prize for Student Excellence will for the first time honor students who graduate at the top of their class with the highest overall GPA and a track record of promising innovative contributions to computer science.

The awards for 2017 are as follows:

The Andrew P. Kosoresow Memorial Award for Excellence in Teaching and Service goes to both graduate students and undergraduates for outstanding contributions to teaching in the Department and exemplary service to the Department and its mission. Candidates are nominated by faculty and have typically have been selected for the SEAS Teaching Award. Criteria include evaluations by their supervising faculty and feedback from students in Courseworks or other forms.

Undergraduate level:  Kevin ChenZeynep Ejder
Graduate level: Julie Chien, Clement Canonne

 

The Certificate of Distinction for Academic Excellence is awarded at graduation to computer science and computer engineering majors who have an overall cumulative grade point average in the top 10% among graduating seniors in CS and CE. The certificates are also awarded to graduating seniors who have completed an undergraduate thesis. Generally recipients have a GPA of 3.9 or higher.

 

The Computer Science Department Award is a cash prize awarded to a degree candidate for scholastic achievements as a computer science major and as acknowledgement of contributions to the Department of Computer Science and to the University as a whole. Academic staff prepares a list of students with the best GPAs in the graduating class. Other factors include general scholastic excellence demonstrated through projects or class contributions. (Historically the award is given to B.A. students.) The recipients for 2017 are:

Weston Jackson (GPA 4.13, 2nd overall, 1st among BA)
Yubo Han (GPA 4.12, 3rd overall, 2nd among BA)

 

The Computer Science Scholarship Award, Excellence in Computer Science. Academic staff prepares a list of students with the best GPAs in the graduating class, while factoring in also general scholastic excellence demonstrated through projects or class contributions. (Historically given to BS students.) The recipients for 2017 are:

Noah Zweben (GPA 4.19, 1st overall)
David Hao (GPA 4.09, 4th overall, 2nd among BS)

 

The Russell C. Mills Award, established by the computer science department in 1992 in memory of Russell C. Mills, is a cash prize given to a computer science major who has exhibited excellence in the area of computer science. The recipients for 2017 are:

Ruoxin Jiang (GPA 4.09, 5th overall)
Miranda Li (GPA 4.08, 6th overall)

 

The Theodore R. Bashkow Award, is a cash prize presented to a computer science senior who has excelled in independent projects. It is awarded in honor of Professor Theodore R. Bashkow, whose contributions as a researcher, teacher, and consultant have significantly advanced the state of the art of computer science. The recipients for 2017 are:

Tera Blevins
Ruoxin Jiang

 

The Jonathan L. Gross Prize for Academic Excellence was established in 2017 in honor of Jonathan Gross, the much loved professor emeritus who taught within the Department from its establishment. Each year a cash gift is awarded to one graduating masters student and to one graduating senior from each of the four undergraduate schools served by the Computer Science Department. The recipients for 2017 are:

Lanting He, Barnard
Rudra Gopinath, General Studies
Filipe de Avila Belbute Peres, Columbia College
Mitchell Gouzenko, SEAS
Gaurav Gite, master’s student, SEAS

 

The Paul Charles Michelman Memorial Award for Exemplary Service is given to a PhD student in Computer Science who has devoted time and effort beyond what is expected to further the department’s goals. It is given in memory of Dr. Paul Michelman, ’93, who devoted himself to improving the department through service while excelling as a researcher. This year’s recipient is:

Clement Canonne

 

The following graduate-level students will receive a CS Service Award for their outstanding performance and exemplary service to the CS@CU community:

Marshall Ball
Clement Canonne
Robert Colgan
Stuart Hadfield
Luke Kowalczyk
Dingzeyu Li
Andrea Lottarini
Kevin Shi
Ana Stoica
Olivia Winn

 

Posted 5/10/2017