Eleanor Lin And Walter McKelvie Selected For Honorable Mention For The Outstanding Undergraduate Researcher Award

Two CS students were selected by the Computing Research Association (CRA) for the 2024 Outstanding Undergraduate Researcher Award for their exemplary dedication to research and academic excellence, earning them a well-deserved commendation. The honorees, Eleanor Lin and Walter McKelvie, have exhibited exceptional skills and commitment in their respective areas of focus within computer science.

Eleanor LinEleanor Lin (CC ‘24), distinguished herself through groundbreaking research with the Spoken Language Processing Group, where she is advised by Professor Julia Hirschberg. Her work as the lead researcher on the Switchboard Dialogue Act Re-alignment project has showcased innovation and contributed significantly to updating the corpus used to identify regional differences in U. S. speakers–extremely important for Automatic Speech Recognition, particularly in telephony. Eleanor made substantial contributions to multiple Speech Lab projects while concurrently serving as a teaching assistant for computer science and linguistics. She also collaborated with researchers from Rice University, the University of Southern California, and Teacher’s College.

Walter MckelvieWalter McKelvie (SEAS ‘24), earned an honorable mention for their remarkable work in theoretical computer science and cryptography. He worked with Professor Tal Malkin and the Crypto Lab on fixing a problem with proof-of-stake blockchains, making a secret leader election “accountable” so that leaders cannot anonymously refuse to publish a block. His dedication to pushing the boundaries of understanding in this field has been commendable, and he greatly contributed to the research by coming up with one of the three paradigms included in the paper and writing several of the technical parts in the paper. McKelvie additionally served as a teaching assistant and collaborated with researchers from Purdue and Harvard.

The honorable mentions serve as a testament to the vibrant research community of the department, where students are encouraged to explore and excel in their chosen fields. Julia Hirschberg, the Percy K. and Vida L. W. Hudson Professor of Computer Science, assembles a team of 15 undergrads with different skills to work on the Speech Lab’s projects. Students can work on data collection and annotation, building large language models (LLMs), or both. Professor Tal Malkin typically has one or two undergraduate students who work on cryptography research. Students need to have mathematical maturity; ideally, they should have taken Malkin’s graduate-level Introduction to Cryptography class.

These recognitions also highlight the department’s commitment to providing students with a robust academic environment that encourages curiosity, creativity, and a passion for discovery.

Amey Pasarkar Recognized By the Computing Research Association’s (CRA) Outstanding Undergraduate Researcher Award

Amey Pasarkar has been selected as a Finalist for his work on computational models that examine the microbiome and its role in health and disease. 

Directional Gaussian Mixture Models of the Gut Microbiome Elucidate Microbial Spatial Structure
Amey P. Pasarkar, Tyler A. Joseph, Itsik Pe’er

The human gut microbiome contains trillions of bacteria that play a key role in preventing and fighting disease. However, the behaviors of these bacteria are still not well understood. One of these behaviors is the spatial arrangement of microbes in the gut. Numerous experimental studies have hinted that the various unique environments along the gut lead to certain bacterial spatial patterns. The research team created a consolidated computational model of the microbiome’s spatial arrangement. Their model can identify many of the previously discovered patterns and propose new ones as well. For example, the models found that the pH levels in the small intestine are likely promoting the growth of a specific family of bacteria.

The results demonstrate that the gut microbiome, while exceptionally large, has predictable and quantifiable spatial patterns that can be used to help us understand its role in health and disease.

Three CS Students Recognized By The Computing Research Association

For this year’s Outstanding Undergraduate Researcher Award, Payal Chandak, Sophia Kolak, and Yanda Chen were among students recognized by the Computing Research Association (CRA) for their work in an area of computing research.

Payal Chandak

Using Machine Learning to Identify Adverse Drug Effects Posing Increased Risk to Women
Payal Chandak Columbia University, Nicholas Tatonetti Columbia University

The researchers developed AwareDX – Analysing Women At Risk for Experiencing Drug toXicity – a machine learning algorithm that identifies and predicts differences in adverse drug effects between men and women by analyzing 50 years’ worth of reports in an FDA database. The algorithm automatically corrects for biases in these data that stem from an overrepresentation of male subjects in clinical research trials.

Though men and women can have different responses to medications – the sleep aid Ambien, for example, metabolizes more slowly in women, causing next-day grogginess – doctors may not know about these differences because most clinical trial data itself is biased toward men. This trickles down to impact prescribing guidelines, drug marketing, and ultimately, patients’ health. Unfortunately, pharmaceutical companies have a history of ignoring complex problems and clinical trials have singularly studied men, not even including women. As a result, there is a lot less information about how women respond to drugs compared to men. The research tries to bridge this information gap. 

Buy Weed Seeds from premium breeders of the best marijuana strains. Discreetly ships across the USA with free weed seeds in every order.

Sophia Kolak

It Takes a Village to Build a Robot: An Empirical Study of The ROS Ecosystem
Sophia Kolak Columbia University, Afsoon Afzal Carnegie Mellon University, Claire Le Goues Carnegie Mellon University, Michael Hilton Carnegie Mellon University, Christopher Steven Timperley Carnegie Mellon University

The Robot Operating System (ROS) is the most popular framework for robotics development. In this paper, the researchers conducted the first major empirical study of ROS, with the goal of understanding how developers collaborate across the many technical disciplines that coalesce in robotics.

Building a complete robot is a difficult task that involves bridging many technical disciplines. ROS aims to simplify development by providing reusable libraries, tools, and conventions for building a robot. Still, as building a robot requires domain expertise in software, mechanical, and electrical engineering, as well as artificial intelligence and robotics, ROS faces knowledge-based barriers to collaboration. The researchers wanted to understand how the necessity of domain-specific knowledge impacts the open-source collaboration model in ROS.

Virtually no one is an expert in every subdomain of robotics: experts who create computer vision packages likely need to rely on software designed by mechanical engineers to implement motor control. As a result, the researchers found that development in ROS is centered around a few unique subgroups each devoted to a different specialty in robotics (i.e. perception, motion). This is unlike other ecosystems, where competing implementations are the norm.

Detecting Performance Patterns with Deep Learning
Sophia Kolak Columbia University

Performance has a major impact on the overall quality of a software project. Performance bugs—bugs that substantially decrease run-time—have long been studied in software engineering, and yet they remain incredibly difficult for developers to handle. In this project, the researchers leveraged contemporary methods in machine learning to create graph embeddings of Python code that can be used to automatically predict performance.

Using un-optimized programming language concepts can lead to performance bugs and the researchers hypothesized that statistical language embeddings could help reveal these patterns. By transforming code samples into graphs that captured the control and data flow of a program, the researchers studied how various unsupervised embeddings of these graphs could be used to predict performance.  

Implementing “sort” by hand as opposed to using the built-in Python sort function is an example of a choice that typically slows down a program’s run-time. When the researchers embedded the AST and data flow of a code snippet in Euclidean space (using DeepWalk), patterns like this were captured in the embedding and allowed classifiers to learn which structures are correlated with various levels of performance.   

I was surprised by how often research changes directions,” said Sophia Kolak. In both projects, they started out with one set of questions but answered completely different ones by the end. “It showed me that, in addition to persistence, research requires open-mindedness.”


Yanda Chen
Honorable Mention

Cross-language Sentence Selection Via Data Augmentation and Rationale Training
Yanda Chen Columbia University, Chris Kedzie Columbia University, Suraj Nair University of Maryland, Petra Galuscakova University of Maryland, Rui Zhang Yale University, Douglas Oard University of Maryland, and Kathleen McKeown Columbia University

In this project, the researchers proposed a new approach to cross-language sentence selection, where they used models to predict sentence-level query relevance with English queries over sentences within document collections in low-resource languages such as Somali, Swahili, and Tagalog. 

The system is used as part of cross-lingual information retrieval and query-focused summarization system. For example, if a user puts in a query word “business activity” and specifies Swahili as the language of source documents, then the system will automatically retrieve the Swahili documents that are related to “business activity” and produce short summaries that are then translated from Swahili to English. 

A major challenge of the project was the lack of training data for low-resource languages. To tackle this problem, the researchers proposed to generate a relevance dataset of query-sentence pairs through data augmentation based on parallel corpora collected from the web. To mitigate the spurious correlations learned by the model, they proposed the idea of rationale training where they first trained a phrase-based statistical machine translation system and used the alignment information to provide additional supervision for the models. 

The approach achieved state-of-the-art results on both text and speech across three languages – Somali, Swahili, and Tagalog. 


CS Undergrads Recognized by the Computing Research Association

For this year’s Outstanding Undergraduate Researcher Award, three computer science students received honorable mentions – Lalita Devadas, Dave Epstein, and Jessy Xinyi Han. The Computing Research Association (CRA) recognized the undergraduates for their work in an area of computing research.

Secure Montgomery Multiplication and Repeated Squares for Modular Exponentiation
Lalita Devadas Columbia University and Justin Bloom Oregon State University

The researchers worked on using some recent advances in garbling of arithmetic circuits for secure exponentiation mod N, a vital operation in many cryptosystems, including in the RSA public-key cryptosystem.

A garbled circuit is a cryptographic protocol which allows for secure two-party computation, in which two parties, Alice and Bob, each with a private input, want to compute some shared function of their inputs without either party learning the other’s input.

Their novel approach implemented the Montgomery multiplication method, which uses clever arithmetic to avoid costly division by the modulus being multiplied in. The best method they found had each wire in a circuit representing one digit of a number in base p. They developed a system of base p arithmetic which is asymptotically more efficient in the given garbled circuit architecture than any existing protocols. 

They measured performance for both approaches by counting the ciphertexts communicated for a single multiplication (a typical measure of efficiency for garbled circuit operations). They found that the base p Montgomery multiplication implementation vastly outperformed all other implementations for values of N with bit length greater than 500 (i.e., all N used for applications like RSA encryption).

“Unfortunately, our best implementations showed only incremental improvement over existing non-Montgomery-based implementations for values of N used in practice,” said Lalita Devadas. “We are still looking into further optimizations using Montgomery multiplication.”

Secure multiparty computation has many applications outside of computer science. For example, suppose five friends want to know their cumulative net worth without anyone learning anyone else’s individual net worth. This is actually a secure computation problem, since the friends want to perform some computation of their inputs while keeping said inputs private from other parties.

Oops! Predicting Unintentional Action in Video
Dave Epstein Columbia University, Boyuan Chen Columbia University, and Carl Vondrick Columbia University

The paper trains models to detect when human action is unintentional using self-supervised computer vision, an important step towards machines that can intelligently reason about the intentions behind complex human actions.

Despite enormous scientific progress over the last five to ten years, machines still struggle with tasks learned quickly and autonomously by young children, such as understanding human behavior or learning to speak a language. Epstein’s research tackles these types of problems by using self-supervised computer vision, a paradigm that predicts information naturally present in large amounts of input data such as images or videos. This stands in contrast with supervised learning, which relies on humans manually labelling data (e.g. “this is a picture of a dog”).

“I was surprised to learn that failure is an expected part of research and that it can take a long time to realize you’re failing,” said Dave Epstein. “Taking a failed idea, identifying the promising parts, and trying again leads to successful research.”

Seeding Network Influence in Biased Networks and the Benefits of Diversity
Ana-Andreea Stoica Columbia University, Jessy Xinyi Han Columbia University, and Augustin Chaintreau Columbia University

The paper explores the problem of social influence maximization and how information is diffused in a social network.

For example, it might be about what kind of news people read on social media, how many people know about job opportunities or who hears about the latest loan options from a bank. So given a social network, classical algorithms are focused on picking the best k early-adopters based on how central they are in a network, say, based on their number of connections, to maximize outreach.

However, since social inequalities are reflected in the uneven networks, classical algorithms which ignore demographics often amplify such inequalities in information access.

“We were wondering if we can do better than an algorithm that ignores demographics,” said Jessy Xinyi Han. “‘Better’ here means more people in total and more people from the disadvantaged group can receive the information.”

Through a network model with unequal communities, they developed new heuristics to take demographics into account, showing that including sensitive features in the input of most natural seed selection algorithms substantially improves diversity but also often leaves efficiency untouched or even provides a small gain.

Such analytical condition turned out to be a closed-form condition on the number of early adopters. They also validated this result on the real CS co-authorship network from DBLP.