13 Papers From the Department Accepted to NeurIPS 2020


Researchers from the Theory, Security & Privacy, and Machine Learning groups will present their papers at the 34th Conference on Neural Information Processing Systems (NeurIPS), one of the premier conferences for machine learning.


Smoothed Analysis of Online and Differentially Private Learning
Nika Haghtalab Cornell University, Tim Roughgarden Columbia University, Abhishek Shetty Cornell University

Differentially private computation and online learning over time have long been known to be much more difficult in a worst-case environment (with an adversary choosing the input) than in an average-case environment (with inputs drawn from an unknown probability distribution). 

This work shows that in the “smoothed analysis” model, in which an adversarially chosen input is perturbed slightly by nature, private computation and online learning are almost as easy as in an average-case setting.  This work also features a novel application of bracketing entropy to regret-minimization in online learning.


HYDRA: Pruning Adversarially Robust Neural Networks
Vikash Sehwag Princeton University, Shiqi Wang Columbia University, Prateek Mittal Princeton University, Suman Jana Columbia University

In safety-critical but computationally resource-constrained applications, deep learning faces two key challenges: lack of robustness against adversarial attacks and large neural network size (often millions of parameters). To overcome these two challenges jointly, the authors propose to make pruning techniques aware of the robust training objective and let the training objective guide the search for which connections to prune. They realize this insight by formulating the pruning objective as an empirical risk minimization problem which is solved efficiently using stochastic gradient descent. They further demonstrate the proposed approach, titled HYDRA, achieves compressed networks with state-of-the-art benign and robust accuracy simultaneously, on various vision tasks.

The authors show that it is crucial to make pruning techniques aware of the robust training objectives to obtain neural networks that are both compact and robust. They demonstrate the success of the proposed approach, HYDRA across three popular vision datasets with four state-of-the-art robust training techniques. Interestingly, they also show the existence of highly robust sub-networks within non-robust networks found by HYDRA.


Hedging in Games: Faster Convergence of External and Swap Regrets
Xi Chen Columbia University, Binghui Peng Columbia University

Online algorithms for regret minimization play an important role in many applications where real-time sequential decision making is crucial. The paper studies the performance of the classic Hedge algorithm and its optimistic variant under the setting of repeated multiplayer games. The highlights are improved analyses on regrets that lead to faster convergence to (coarse) correlated equilibria.


Planning with General Objective Functions: Going Beyond Total Rewards
Ruosong Wang Carnegie Mellon University, Peilin Zhong Columbia University, Simon Du University of Washington, Russ Salakhutdinov Carnegie Mellon University, Lin Yang University of California, Los Angeles

This work studied a general class of problems in reinforcement learning. In particular, it proposed a planning algorithm for a deterministic system with general objective functions.

The algorithm is based on layering techniques in the sketching and streaming literature. This is the first  time that the layering technique appears in sequential decision-making algorithms. Furthermore, it also proposed a new framework of dynamic programming approach for solving the deterministic system.


Ensuring Fairness Beyond the Training Data
Debmalya Mandal Columbia University, Samuel Deng Columbia University, Suman Jana Columbia University, Jeannette Wing Columbia University, Daniel Hsu Columbia University

This paper initiates the study of fair classifiers that are robust to perturbations to the training distribution. Despite recent progress, the literature on fairness in machine learning has largely ignored the design of such fair and robust classifiers. Instead, fairness in these previous works was evaluated on sample datasets that may be unreliable due to sampling bias and missing attributes.

The authors of the present paper provide a new algorithm for training classifiers that are fair not only with respect to the training data, but also with respect to weight perturbations of the training data, which may account for discrepancies due to sampling bias and missing attributes. The approach is based on a min-max objective that is solved using a new iterative algorithm based on online learning. Experiments on standard machine learning fairness datasets suggest that, compared to the state-of-the-art fair classifiers, the robustly-fair classifiers retain fairness guarantees and test accuracy for a large class of perturbations on the test set.


Causal Imitation Learning With Unobserved Confounders
Junzhe Zhang Columbia University, Daniel Kumor Purdue University, Elias Bareinboim Columbia University

One of the common ways children learn is by mimicking adults. If an agent is able to perfectly copy the behavior of a human expert, one may be tempted to surmise that the agent will perform equally well at the same type of task.

The paper “Causal Imitation Learning with Unobserved Confounders” shows that this notion is somewhat naive, and the agent would need to have exactly the same sensory inputs as the human for perfect imitation to translate into the same distribution of rewards. Perhaps surprisingly, if its inputs differ even slightly from the demonstrator’s, the agent’s performance can be arbitrarily low, even when it learns to copy the demonstrator’s actions perfectly.

This paper investigates imitation learning through a causal lens and reveals why the agent may fail to learn even when infinite demonstration’s trajectories are available. It then introduces a formal treatment to the problem, including clearly formulated conditions for when imitation learning can succeed. These criteria advance people’s understanding of how to identify situations/assumptions sufficient for imitation learning. Finally, it introduces a new algorithm for the agent to learn a strategy based on its own sensory capabilities, yet with performance identical to the expert.


Learning Causal Effects via Weighted Empirical Risk Minimization
Yonghan Jung Purdue University, Jin Tian Iowa State University, Elias Bareinboim Columbia University

Learning causal effects from data is a fundamental problem across the empirical sciences. Determining the identifiability of a target effect from a combination of the observational distribution and the causal graph underlying a phenomenon is well-understood in theory. However, in practice, it remains a challenge to apply the identification theory to estimate the identified causal functionals from finite samples. Although a plethora of practical estimators has been developed under the setting known as the back-door, also called conditional ignorability, there exists still no systematic way of estimating arbitrary causal functionals that are both computationally and statistically attractive.

This work aims to bridge this gap, from causal identification to causal estimation. The authors note that estimating functionals from limited samples based on the empirical risk minimization (ERM) principle has been pervasive in the machine learning literature, and these methods have been extended to causal inference under the back-door setting. In this paper, they introduce a learning framework that marries two families of methods, benefiting from the generality of the causal identification theory and the effectiveness of the estimators produced based on ERM’s principle. Specifically, they develop a sound and complete algorithm that generates causal functionals in the form of weighted distributions that are amenable to the ERM optimization. They then provide a practical procedure for learning causal effects from finite samples and a causal graph. Finally, experimental results support the effectiveness of our approach.


Causal Discovery from Soft Interventions with Unknown Targets: Characterization and Learning
Amin Jaber Purdue University, Murat Kocaoglu MIT-IBM Watson AI Lab, Karthikeyan Shanmugam MIT-IBM Watson AI Lab, Elias Bareinboim Columbia University

One fundamental problem in the empirical sciences is of reconstructing the causal structure that underlies a phenomenon of interest through observation and experimentation. While there exists a plethora of methods capable of learning the equivalence class of causal structures that are compatible with observations, it is less well-understood how to systematically combine observations and experiments to reconstruct the underlying structure. 

This work investigates the task of structural learning from a combination of observational and experimental data when the interventional targets are unknown. The relaxation of not knowing the interventional targets is quite practical since in some fields, such as molecular biology, we are unable to identify the variables affected by interventions.

The topology of a causal structure imprints a set of non-parametric constraints over the generated distributions, one type of which are conditional independences. The proposed work derives a graphical characterization that allows one to test whether two causal structures are indistinguishable with respect to the set of constraints in the available data. As a result, an algorithm is developed that is capable of harnessing the collection of data to learn the equivalence class of the true causal structure. The algorithm is proven to be complete, in the sense that it is the most informative in the sample limit.


Characterizing Optimal Mixed Policies: Where to Intervene and What to Observe
Sanghack Lee Columbia University, Elias Bareinboim Columbia University

Intelligent agents are continuously faced with the challenge of optimizing a policy based on what they can observe (see) and which actions they can take (do) in the environment where they are deployed. Most policies can be parametrized in terms of these two dimensions, i.e., as a function of what can be seen and done given a certain situation, which we call a mixed policy.

This work investigates several properties of the mixed policies class and provides a characterization with respect to criteria such as optimality and non-redundancy. In particular, the authors introduce a graphical criterion to identify unnecessary contexts for a set of actions, leading to a natural characterization of the non-redundancy of mixed policies. Then derive sufficient conditions under which one strategy can dominate the other concerning their maximum achievable expected rewards or optimality. This characterization leads to a fundamental understanding of the space of mixed policies and possible refinement of the agent’s strategy to converge to the optimum faster and more robustly. One surprising result of the causal characterization is that the agent following a standard, non-causal approach — i.e., intervening on all intervenable variables and observing all available contexts — may be hurting itself, and may never be able to achieve an optimal performance regardless of the number of interactions performed. 


General Transportability of Soft Interventions: Completeness Results
Juan Correa Columbia University, Elias Bareinboim Columbia University

The challenge of generalizing causal knowledge across different environments is pervasive in scientific explorations, including in artificial intelligence, machine learning, and data science. Scientists collect data and carry out experiments in a particular environment with the intent of, almost invariably, use them for applications deployed under different conditions. For instance, consider a rover trained in the California desert. After exhaustive months of training, NASA wants to deploy the vehicle on Mars, where the environment is not the same, yet somewhat similar to Earth. The expectation is that the rover will need minimal “experimentation” (i.e., trial-and-error) on Mars by leveraging the knowledge acquired here, operating more surgically and effectively there. In the causal inference literature, this task falls under the rubric of transportability (Bareinboim and Pearl, 2013).

Despite many advances in this area, there is not much work on transportability for interventions consisting of conditional plans or stochastic policies. These interventions, known as soft interventions, are common in many practical applications. For instance, it is critical in Reinforcement Learning, where the agents need to adapt to changing conditions in an unknown environment. This research extends transportability theory to encompass tasks where input data, as well as the target of the analysis, involve soft interventions.

The paper introduces the first sufficient and necessary graphical condition and algorithm to decide soft-transportability. That is, to determine if a query interest is inferable from a combination of available (soft-)experimental data from different domains, based on causal and domain knowledge in the form of a graph. Finally, the research proves the completeness of a set of inference rules, known as σ-calculus, for this task. In contrast to do-interventions commonly used in causal inference analysis, soft interventions model real-world decision making more closely. For this reason, authors expect these results will help data scientists apply formal transportability analysis in broader and more realistic scenarios.


Multiparameter Persistence Image for Topological Machine Learning
Mathieu Carrière DataShape, Andrew Blumberg Columbia University

Topological data analysis (TDA) is a new methodology for using geometric structures in data for inference and learning. A central theme in the area is the idea of persistence, which in its most basic form studies how measures of shape change as a scale parameter varies.  In many applications, there are several different parameters one might wish to vary: for example, scale and density.  One can compute persistent invariants in this setting as well – a key problem is then to transform the resulting multi-scale descriptors, which are very complicated, into tractable numerical extracts.

This paper introduces a new descriptor for multi-parameter persistence, which we call the multi-parameter persistence image.  The resulting quantity is suitable for machine learning applications, is robust to perturbations in the data, has provably finer resolution than existing descriptors, and can be efficiently computed on data sets of realistic size.


No-Regret Learning and Mixed Nash Equilibria: They Do Not Mix
Lampros Flokas Columbia University, Emmanouil-Vasileios Vlatakis-Gkaragkounis Columbia University, Thanasis Lianeas National Technical University of Athens, Panayotis Mertikopoulos Univ. Grenoble Alpes, Georgios Piliouras Singapore University of Technology and Design

One of the most fundamental ideas in game theory is that when it comes to economic interactions between multiple agents randomization is necessary to ensure economic stability. The canonical example here would be Rock-Paper-Scissors where it is necessary for both agents to pick a strategy uniformly at random so that neither of them has a profitable deviation.

This paper establishes that such randomized equilibria are at odds with algorithmic stability showing that even when primed to choose them optimization-driven dynamics will always fail to do so. Thus, this work reveals a novel discrepancy between economic stability and algorithmic stability with important implications both for game theory as well as computer science.


Optimal Private Median Estimation under Minimal Distributional Assumptions
Christos Tzamos University of Wisconsin-Madison, Emmanouil-Vasileios Vlatakis-Gkaragkounis Columbia University, Ilias Zadik New York University

One of the most fundamental tasks in statistics is the computation of the median of an underlying distribution from a finite number of samples. Median is commonly preferable to other location estimation methods, like the mean or mode, due to its high break-down point.

Even in the task of determining the critical high-risk age groups during the ongoing COVID-19 pandemic, median estimator plays a significant role in balancing out some possible outliers. Despite its aggregating nature, median could give crucial information about the participants of a data set causing probably several social, political, and financial implications.

This paper establishes an optimal and efficient mechanism for the computation of the median under differential privacy constraints to protect the individual or group privacy of the members of the sample.