Three CS Students Recognized By The Computing Research Association

For this year’s Outstanding Undergraduate Researcher Award, Payal Chandak, Sophia Kolak, and Yanda Chen were among students recognized by the Computing Research Association (CRA) for their work in an area of computing research.


Payal Chandak
Finalist

Using Machine Learning to Identify Adverse Drug Effects Posing Increased Risk to Women
Payal Chandak Columbia University, Nicholas Tatonetti Columbia University

The researchers developed AwareDX – Analysing Women At Risk for Experiencing Drug toXicity – a machine learning algorithm that identifies and predicts differences in adverse drug effects between men and women by analyzing 50 years’ worth of reports in an FDA database. The algorithm automatically corrects for biases in these data that stem from an overrepresentation of male subjects in clinical research trials.

Though men and women can have different responses to medications – the sleep aid Ambien, for example, metabolizes more slowly in women, causing next-day grogginess – doctors may not know about these differences because most clinical trial data itself is biased toward men. This trickles down to impact prescribing guidelines, drug marketing, and ultimately, patients’ health. Unfortunately, pharmaceutical companies have a history of ignoring complex problems and clinical trials have singularly studied men, not even including women. As a result, there is a lot less information about how women respond to drugs compared to men. The research tries to bridge this information gap. 

Buy Weed Seeds from premium breeders of the best marijuana strains. Discreetly ships across the USA with free weed seeds in every order.


Sophia Kolak
Finalist

It Takes a Village to Build a Robot: An Empirical Study of The ROS Ecosystem
Sophia Kolak Columbia University, Afsoon Afzal Carnegie Mellon University, Claire Le Goues Carnegie Mellon University, Michael Hilton Carnegie Mellon University, Christopher Steven Timperley Carnegie Mellon University

The Robot Operating System (ROS) is the most popular framework for robotics development. In this paper, the researchers conducted the first major empirical study of ROS, with the goal of understanding how developers collaborate across the many technical disciplines that coalesce in robotics.

Building a complete robot is a difficult task that involves bridging many technical disciplines. ROS aims to simplify development by providing reusable libraries, tools, and conventions for building a robot. Still, as building a robot requires domain expertise in software, mechanical, and electrical engineering, as well as artificial intelligence and robotics, ROS faces knowledge-based barriers to collaboration. The researchers wanted to understand how the necessity of domain-specific knowledge impacts the open-source collaboration model in ROS.

Virtually no one is an expert in every subdomain of robotics: experts who create computer vision packages likely need to rely on software designed by mechanical engineers to implement motor control. As a result, the researchers found that development in ROS is centered around a few unique subgroups each devoted to a different specialty in robotics (i.e. perception, motion). This is unlike other ecosystems, where competing implementations are the norm.

Detecting Performance Patterns with Deep Learning
Sophia Kolak Columbia University

Performance has a major impact on the overall quality of a software project. Performance bugs—bugs that substantially decrease run-time—have long been studied in software engineering, and yet they remain incredibly difficult for developers to handle. In this project, the researchers leveraged contemporary methods in machine learning to create graph embeddings of Python code that can be used to automatically predict performance.

Using un-optimized programming language concepts can lead to performance bugs and the researchers hypothesized that statistical language embeddings could help reveal these patterns. By transforming code samples into graphs that captured the control and data flow of a program, the researchers studied how various unsupervised embeddings of these graphs could be used to predict performance.  

Implementing “sort” by hand as opposed to using the built-in Python sort function is an example of a choice that typically slows down a program’s run-time. When the researchers embedded the AST and data flow of a code snippet in Euclidean space (using DeepWalk), patterns like this were captured in the embedding and allowed classifiers to learn which structures are correlated with various levels of performance.   

I was surprised by how often research changes directions,” said Sophia Kolak. In both projects, they started out with one set of questions but answered completely different ones by the end. “It showed me that, in addition to persistence, research requires open-mindedness.”

 


Yanda Chen
Honorable Mention

Cross-language Sentence Selection Via Data Augmentation and Rationale Training
Yanda Chen Columbia University, Chris Kedzie Columbia University, Suraj Nair University of Maryland, Petra Galuscakova University of Maryland, Rui Zhang Yale University, Douglas Oard University of Maryland, and Kathleen McKeown Columbia University

In this project, the researchers proposed a new approach to cross-language sentence selection, where they used models to predict sentence-level query relevance with English queries over sentences within document collections in low-resource languages such as Somali, Swahili, and Tagalog. 

The system is used as part of cross-lingual information retrieval and query-focused summarization system. For example, if a user puts in a query word “business activity” and specifies Swahili as the language of source documents, then the system will automatically retrieve the Swahili documents that are related to “business activity” and produce short summaries that are then translated from Swahili to English. 

A major challenge of the project was the lack of training data for low-resource languages. To tackle this problem, the researchers proposed to generate a relevance dataset of query-sentence pairs through data augmentation based on parallel corpora collected from the web. To mitigate the spurious correlations learned by the model, they proposed the idea of rationale training where they first trained a phrase-based statistical machine translation system and used the alignment information to provide additional supervision for the models. 

The approach achieved state-of-the-art results on both text and speech across three languages – Somali, Swahili, and Tagalog. 

 

13 Papers From the Department Accepted to NeurIPS 2020

Researchers from the Theory, Security & Privacy, and Machine Learning groups will present their papers at the 34th Conference on Neural Information Processing Systems (NeurIPS), one of the premier conferences for machine learning.

 

Smoothed Analysis of Online and Differentially Private Learning
Nika Haghtalab Cornell University, Tim Roughgarden Columbia University, Abhishek Shetty Cornell University

Differentially private computation and online learning over time have long been known to be much more difficult in a worst-case environment (with an adversary choosing the input) than in an average-case environment (with inputs drawn from an unknown probability distribution). 

This work shows that in the “smoothed analysis” model, in which an adversarially chosen input is perturbed slightly by nature, private computation and online learning are almost as easy as in an average-case setting.  This work also features a novel application of bracketing entropy to regret-minimization in online learning.

 

HYDRA: Pruning Adversarially Robust Neural Networks
Vikash Sehwag Princeton University, Shiqi Wang Columbia University, Prateek Mittal Princeton University, Suman Jana Columbia University

In safety-critical but computationally resource-constrained applications, deep learning faces two key challenges: lack of robustness against adversarial attacks and large neural network size (often millions of parameters). To overcome these two challenges jointly, the authors propose to make pruning techniques aware of the robust training objective and let the training objective guide the search for which connections to prune. They realize this insight by formulating the pruning objective as an empirical risk minimization problem which is solved efficiently using stochastic gradient descent. They further demonstrate the proposed approach, titled HYDRA, achieves compressed networks with state-of-the-art benign and robust accuracy simultaneously, on various vision tasks.

The authors show that it is crucial to make pruning techniques aware of the robust training objectives to obtain neural networks that are both compact and robust. They demonstrate the success of the proposed approach, HYDRA across three popular vision datasets with four state-of-the-art robust training techniques. Interestingly, they also show the existence of highly robust sub-networks within non-robust networks found by HYDRA.

 

Hedging in Games: Faster Convergence of External and Swap Regrets
Xi Chen Columbia University, Binghui Peng Columbia University

Online algorithms for regret minimization play an important role in many applications where real-time sequential decision making is crucial. The paper studies the performance of the classic Hedge algorithm and its optimistic variant under the setting of repeated multiplayer games. The highlights are improved analyses on regrets that lead to faster convergence to (coarse) correlated equilibria.

 

Planning with General Objective Functions: Going Beyond Total Rewards
Ruosong Wang Carnegie Mellon University, Peilin Zhong Columbia University, Simon Du University of Washington, Russ Salakhutdinov Carnegie Mellon University, Lin Yang University of California, Los Angeles

This work studied a general class of problems in reinforcement learning. In particular, it proposed a planning algorithm for a deterministic system with general objective functions.

The algorithm is based on layering techniques in the sketching and streaming literature. This is the first  time that the layering technique appears in sequential decision-making algorithms. Furthermore, it also proposed a new framework of dynamic programming approach for solving the deterministic system.

 

Ensuring Fairness Beyond the Training Data
Debmalya Mandal Columbia University, Samuel Deng Columbia University, Suman Jana Columbia University, Jeannette Wing Columbia University, Daniel Hsu Columbia University

This paper initiates the study of fair classifiers that are robust to perturbations to the training distribution. Despite recent progress, the literature on fairness in machine learning has largely ignored the design of such fair and robust classifiers. Instead, fairness in these previous works was evaluated on sample datasets that may be unreliable due to sampling bias and missing attributes.

The authors of the present paper provide a new algorithm for training classifiers that are fair not only with respect to the training data, but also with respect to weight perturbations of the training data, which may account for discrepancies due to sampling bias and missing attributes. The approach is based on a min-max objective that is solved using a new iterative algorithm based on online learning. Experiments on standard machine learning fairness datasets suggest that, compared to the state-of-the-art fair classifiers, the robustly-fair classifiers retain fairness guarantees and test accuracy for a large class of perturbations on the test set.

 

Causal Imitation Learning With Unobserved Confounders
Junzhe Zhang Columbia University, Daniel Kumor Purdue University, Elias Bareinboim Columbia University

One of the common ways children learn is by mimicking adults. If an agent is able to perfectly copy the behavior of a human expert, one may be tempted to surmise that the agent will perform equally well at the same type of task.

The paper “Causal Imitation Learning with Unobserved Confounders” shows that this notion is somewhat naive, and the agent would need to have exactly the same sensory inputs as the human for perfect imitation to translate into the same distribution of rewards. Perhaps surprisingly, if its inputs differ even slightly from the demonstrator’s, the agent’s performance can be arbitrarily low, even when it learns to copy the demonstrator’s actions perfectly.

This paper investigates imitation learning through a causal lens and reveals why the agent may fail to learn even when infinite demonstration’s trajectories are available. It then introduces a formal treatment to the problem, including clearly formulated conditions for when imitation learning can succeed. These criteria advance people’s understanding of how to identify situations/assumptions sufficient for imitation learning. Finally, it introduces a new algorithm for the agent to learn a strategy based on its own sensory capabilities, yet with performance identical to the expert.

 

Learning Causal Effects via Weighted Empirical Risk Minimization
Yonghan Jung Purdue University, Jin Tian Iowa State University, Elias Bareinboim Columbia University

Learning causal effects from data is a fundamental problem across the empirical sciences. Determining the identifiability of a target effect from a combination of the observational distribution and the causal graph underlying a phenomenon is well-understood in theory. However, in practice, it remains a challenge to apply the identification theory to estimate the identified causal functionals from finite samples. Although a plethora of practical estimators has been developed under the setting known as the back-door, also called conditional ignorability, there exists still no systematic way of estimating arbitrary causal functionals that are both computationally and statistically attractive.

This work aims to bridge this gap, from causal identification to causal estimation. The authors note that estimating functionals from limited samples based on the empirical risk minimization (ERM) principle has been pervasive in the machine learning literature, and these methods have been extended to causal inference under the back-door setting. In this paper, they introduce a learning framework that marries two families of methods, benefiting from the generality of the causal identification theory and the effectiveness of the estimators produced based on ERM’s principle. Specifically, they develop a sound and complete algorithm that generates causal functionals in the form of weighted distributions that are amenable to the ERM optimization. They then provide a practical procedure for learning causal effects from finite samples and a causal graph. Finally, experimental results support the effectiveness of our approach.

 

Causal Discovery from Soft Interventions with Unknown Targets: Characterization and Learning
Amin Jaber Purdue University, Murat Kocaoglu MIT-IBM Watson AI Lab, Karthikeyan Shanmugam MIT-IBM Watson AI Lab, Elias Bareinboim Columbia University

One fundamental problem in the empirical sciences is of reconstructing the causal structure that underlies a phenomenon of interest through observation and experimentation. While there exists a plethora of methods capable of learning the equivalence class of causal structures that are compatible with observations, it is less well-understood how to systematically combine observations and experiments to reconstruct the underlying structure. 

This work investigates the task of structural learning from a combination of observational and experimental data when the interventional targets are unknown. The relaxation of not knowing the interventional targets is quite practical since in some fields, such as molecular biology, we are unable to identify the variables affected by interventions.

The topology of a causal structure imprints a set of non-parametric constraints over the generated distributions, one type of which are conditional independences. The proposed work derives a graphical characterization that allows one to test whether two causal structures are indistinguishable with respect to the set of constraints in the available data. As a result, an algorithm is developed that is capable of harnessing the collection of data to learn the equivalence class of the true causal structure. The algorithm is proven to be complete, in the sense that it is the most informative in the sample limit.

 

Characterizing Optimal Mixed Policies: Where to Intervene and What to Observe
Sanghack Lee Columbia University, Elias Bareinboim Columbia University

Intelligent agents are continuously faced with the challenge of optimizing a policy based on what they can observe (see) and which actions they can take (do) in the environment where they are deployed. Most policies can be parametrized in terms of these two dimensions, i.e., as a function of what can be seen and done given a certain situation, which we call a mixed policy.

This work investigates several properties of the mixed policies class and provides a characterization with respect to criteria such as optimality and non-redundancy. In particular, the authors introduce a graphical criterion to identify unnecessary contexts for a set of actions, leading to a natural characterization of the non-redundancy of mixed policies. Then derive sufficient conditions under which one strategy can dominate the other concerning their maximum achievable expected rewards or optimality. This characterization leads to a fundamental understanding of the space of mixed policies and possible refinement of the agent’s strategy to converge to the optimum faster and more robustly. One surprising result of the causal characterization is that the agent following a standard, non-causal approach — i.e., intervening on all intervenable variables and observing all available contexts — may be hurting itself, and may never be able to achieve an optimal performance regardless of the number of interactions performed. 

 

General Transportability of Soft Interventions: Completeness Results
Juan Correa Columbia University, Elias Bareinboim Columbia University

The challenge of generalizing causal knowledge across different environments is pervasive in scientific explorations, including in artificial intelligence, machine learning, and data science. Scientists collect data and carry out experiments in a particular environment with the intent of, almost invariably, use them for applications deployed under different conditions. For instance, consider a rover trained in the California desert. After exhaustive months of training, NASA wants to deploy the vehicle on Mars, where the environment is not the same, yet somewhat similar to Earth. The expectation is that the rover will need minimal “experimentation” (i.e., trial-and-error) on Mars by leveraging the knowledge acquired here, operating more surgically and effectively there. In the causal inference literature, this task falls under the rubric of transportability (Bareinboim and Pearl, 2013).

Despite many advances in this area, there is not much work on transportability for interventions consisting of conditional plans or stochastic policies. These interventions, known as soft interventions, are common in many practical applications. For instance, it is critical in Reinforcement Learning, where the agents need to adapt to changing conditions in an unknown environment. This research extends transportability theory to encompass tasks where input data, as well as the target of the analysis, involve soft interventions.

The paper introduces the first sufficient and necessary graphical condition and algorithm to decide soft-transportability. That is, to determine if a query interest is inferable from a combination of available (soft-)experimental data from different domains, based on causal and domain knowledge in the form of a graph. Finally, the research proves the completeness of a set of inference rules, known as σ-calculus, for this task. In contrast to do-interventions commonly used in causal inference analysis, soft interventions model real-world decision making more closely. For this reason, authors expect these results will help data scientists apply formal transportability analysis in broader and more realistic scenarios.

 

Multiparameter Persistence Image for Topological Machine Learning
Mathieu Carrière DataShape, Andrew Blumberg Columbia University

Topological data analysis (TDA) is a new methodology for using geometric structures in data for inference and learning. A central theme in the area is the idea of persistence, which in its most basic form studies how measures of shape change as a scale parameter varies.  In many applications, there are several different parameters one might wish to vary: for example, scale and density.  One can compute persistent invariants in this setting as well – a key problem is then to transform the resulting multi-scale descriptors, which are very complicated, into tractable numerical extracts.

This paper introduces a new descriptor for multi-parameter persistence, which we call the multi-parameter persistence image.  The resulting quantity is suitable for machine learning applications, is robust to perturbations in the data, has provably finer resolution than existing descriptors, and can be efficiently computed on data sets of realistic size.

 

No-Regret Learning and Mixed Nash Equilibria: They Do Not Mix
Lampros Flokas Columbia University, Emmanouil-Vasileios Vlatakis-Gkaragkounis Columbia University, Thanasis Lianeas National Technical University of Athens, Panayotis Mertikopoulos Univ. Grenoble Alpes, Georgios Piliouras Singapore University of Technology and Design

One of the most fundamental ideas in game theory is that when it comes to economic interactions between multiple agents randomization is necessary to ensure economic stability. The canonical example here would be Rock-Paper-Scissors where it is necessary for both agents to pick a strategy uniformly at random so that neither of them has a profitable deviation.

This paper establishes that such randomized equilibria are at odds with algorithmic stability showing that even when primed to choose them optimization-driven dynamics will always fail to do so. Thus, this work reveals a novel discrepancy between economic stability and algorithmic stability with important implications both for game theory as well as computer science.

 

Optimal Private Median Estimation under Minimal Distributional Assumptions
Christos Tzamos University of Wisconsin-Madison, Emmanouil-Vasileios Vlatakis-Gkaragkounis Columbia University, Ilias Zadik New York University

One of the most fundamental tasks in statistics is the computation of the median of an underlying distribution from a finite number of samples. Median is commonly preferable to other location estimation methods, like the mean or mode, due to its high break-down point.

Even in the task of determining the critical high-risk age groups during the ongoing COVID-19 pandemic, median estimator plays a significant role in balancing out some possible outliers. Despite its aggregating nature, median could give crucial information about the participants of a data set causing probably several social, political, and financial implications.

This paper establishes an optimal and efficient mechanism for the computation of the median under differential privacy constraints to protect the individual or group privacy of the members of the sample.

How a Peer-Led Class is Making Computer Science More Equitable

When the Emerging Scholars Program (ESP) was established 12 years ago it was meant to introduce computer science (CS) topics to women interested in CS and to encourage them to pursue CS as a major. From four sections in 2008, it has expanded to eight and now it is open to everyone interested in CS and focused on making CS more inclusive for BIPOC students.  

“The field of computer science is not programming, it is about solving problems computationally,” said Diana Abagyan, a senior from Columbia College who is a peer leader in the program. “The interactive nature of the class, in a low-stakes environment, equips students with the confidence that they can succeed in CS as much as anyone else.”

ESP aims to broaden how first and second-year students think about CS through its once-a-week, 75-minute workshop, and discussion section that can be taken in parallel with introductory CS classes. Each workshop is run by a peer leader who presents topics and problems from a specific field in CS. 

The pass/fail class has no homework, no programming, and no other prerequisites except an interest in computer science. The goal of the program is to encourage more students to pursue computer science beyond the introductory level and into the major, by creating a program that encourages active participation and discussion of CS-related topics in a more positive, relaxed, and open environment. 

“It was low-stakes CS work which I found incredibly helpful for my first semester,” said A.J. Cephus, a freshman from the School of General Studies who has decided to pursue CS as a major after taking the course. Like many ESP students, he does not have a computer science background. He found the course to be extremely helpful and made CS more approachable. Continued Cephus, “It was also inspiring to interact with undergraduate CS majors that are well versed in the topics.”

“We use problem-solving activities to help students wrap their heads around the concepts,” said Lindsey Wales, a peer leader who first encountered the program as a student in her freshman year. For example, in the natural language processing seminar, students have to read an unfamiliar language with English translations and try to use context clues to write and read sentences in that language. This helps them start to think about how machine translation works and which features are important to the structure of a language. They are able to work together to come up with a solution faster than they would alone, which is also a good introduction to team problem-solving.

“That teamwork is essential and shows students that not everything about CS is programming,” said Adam Cannon, a senior lecturer in discipline who is the program’s faculty advisor that started the program. The use of peer-led team learning has proven useful for both students and peer leaders in developing their skills.  “Sometimes students don’t see this when their only early experiences are programming intensive introductory classes.”

The teaching style of ESP relies on small group interaction, discussion, and activities, rather than lectures. It has also been one of the department’s attempts to create a network of students who have a community to which to turn. ESP peer leaders also serve as points of contact within the major, and can provide important advice for classes, internships, and career paths. 

“The best part of computer science is creative problem-solving and we try to show our students this from the beginning,” said Timothy Randolph, a third-year PhD student. He and Roland Maio are program coordinators who guide the program, support the peer leaders and workshop assistants, and keep everything running as smoothly as possible. Within the CS department, ESP has a unique opportunity and responsibility to make the major engaging to a wider variety of students and give them the skills and support network they need to succeed. 

“It’s not always easy, but I like knowing that we can make Columbia CS stronger and improve the experience of our students,” said Randolph. 

The Emerging Scholars Program team (left to right): Diana Abagyan, Jen Ozmen, Ajita Bala, Maria Teresa Tome, Pranjali Sharma, Walden Wang, Lindsey Wales, Roland Maio, Gala Kalevic, Tim Randolph

 

If you are interested in the Emerging Scholars Program, they are looking for students and workshop assistants for the Spring semester. Check out their website for details.