They Found a Way to Thematically Sort All of Wikipedia on a Laptop

Professor David Blei, with co-authors Matthew Hoffman and Francis Bach, is recognized with a Test of Time Award at NeurIPS, the world’s top machine learning conference, for scaling his topic modeling algorithm to billions of documents.

CS Welcomes Three New Faculty

Josh Alman, Toniann Pitassi, and Richard Zemel join the department. They will facilitate research and learning in CS theory, machine learning, and artificial intelligence.

Josh Alman
Assistant Professor
PhD Computer Science, Massachusetts Institute of Technology 2019
MS Computer Science, Stanford University 2016
BS Mathematics, Massachusetts Institute of Technology 2014

Josh Alman is a theoretical computer scientist who works broadly throughout algorithm design and complexity theory. His current research focuses on algorithms for fundamental algebraic problems. In particular, he is interested in how quickly one can multiply matrices and compute important linear transforms like Fourier transforms, and how to apply algebraic tools like these to new problems throughout computer science.

Alman joins the Theory Group and looks forward to working with students who have a background in theoretical computer science or mathematics on various projects. This Fall, he will teach a graduate class on algebraic techniques in algorithms and complexity.

Toniann Pitassi
Jeffrey L. and Brenda Bleustein Professor of Engineering
PhD Computer Science, University of Toronto 1992
MS Computer Science, Pennsylvania State University 1985
BS Chemistry & Computer Science, Pennsylvania State University 1985

Pitassi’s research area is computational complexity: what are the inherent limitations on the resources (time, space, randomness) required to solve fundamental computational problems? An important direction aimed at resolving the ultimate question of this type, the P versus NP question, is propositional proof complexity, which studies the difficulty of proving tautological statements in standard proof systems. She has also worked extensively in communication complexity which studies how much information must be communicated between two or more players in order to compute a joint function of their inputs. Her other research interests lie in the foundations of machine learning, particularly in the areas of privacy, fairness, and reproducibility.

Previously Pitassi was the Bell Research Chair at the University of Toronto, a Canadian Institute for Advanced Research research chair, and a research lead at the Schwartz-Reisman Institute for Technology and Society. She currently holds a five-year appointment as visiting professor at the Institute for Advanced Study. Pitassi is the 2021 EATCS Award recipient and was an invited speaker at the International Congress of Mathematicians

At Columbia, Pitassi joins the department’s Theory Group and the Machine Learning Group. She is excited to collaborate with new colleagues and graduate students at Columbia and to explore New York. Her hobbies and outside interests are constantly changing with the latest being stained glass.

Richard Zemel
Professor of Computer Science
PhD Computer Science, University of Toronto 1994
MS Computer Science, University of Toronto 1989
BA History and Science, Harvard University 1984

Richard Zemel’s research focuses on machine learning and artificial intelligence. He is also interested in natural intelligence, including neuroscience and cognitive science. His recent research targets learning with few labels and creating robust and controllable machine learning systems, which can transfer to a variety of tasks and domains. He also has a strong interest in algorithmic fairness.

Previously, Zemel was the NSERC Industrial Research Chair in Machine Learning at the University of Toronto, and co-founded and served as the Research Director of the Vector Institute for Artificial Intelligence. He is the recipient of an ONR Young Investigator Award, an NVIDIA Pioneer of AI Award, and is a Fellow of the Canadian Institute for Advanced Research.

This Fall, he will teach a course on Neural Networks and Deep Learning and will teach a seminar class on machine learning in the Spring term. Zemmel is looking for PhD students who are interested in machine learning to join his research group. In his spare time, he likes sports such as hockey, squash and biking, and eating.

13 Research Papers Accepted to ICML 2021

Papers from CS researchers have been accepted to the 38th International Conference on Machine Learning (ICML 2021).

Associate Professor Daniel Hsu was one of the publication chairs of the conference and Assistant Professor Elham Azizi helped organize the 2021 ICML Workshop on Computational Biology. The workshop highlighted how machine learning approaches can be tailored to making both translational and basic scientific discoveries with biological data.

Below are the abstracts and links to the accepted papers.

A Proxy Variable View of Shared Confounding
Yixin Wang Columbia University, David Blei Columbia University

Causal inference from observational data can be biased by unobserved confounders. Confounders—the variables that affect both the treatments and the outcome—induce spurious non-causal correlations between the two. Without additional conditions, unobserved confounders generally make causal quantities hard to identify. In this paper, we focus on the setting where there are many treatments with shared confounding, and we study under what conditions is causal identification possible. The key observation is that we can view subsets of treatments as proxies of the unobserved confounder and identify the intervention distributions of the rest. Moreover, while existing identification formulas for proxy variables involve solving integral equations, we show that one can circumvent the need for such solutions by directly modeling the data. Finally, we extend these results to an expanded class of causal graphs, those with other confounders and selection variables.

Unsupervised Representation Learning via Neural Activation Coding
Yookoon Park Columbia University, Sangho Lee Seoul National University, Gunhee Kim Seoul National University, David Blei Columbia University

We present neural activation coding (NAC) as a novel approach for learning deep representations from unlabeled data for downstream applications. We argue that the deep encoder should maximize its nonlinear expressivity on the data for downstream predictors to take full advantage of its representation power. To this end, NAC maximizes the mutual information between activation patterns of the encoder and the data over a noisy communication channel. We show that learning for a noise-robust activation code increases the number of distinct linear regions of ReLU encoders, hence the maximum nonlinear expressivity. More interestingly, NAC learns both continuous and discrete representations of data, which we respectively evaluate on two downstream tasks: (i) linear classification on CIFAR-10 and ImageNet-1K and (ii) nearest neighbor retrieval on CIFAR-10 and FLICKR-25K. Empirical results show that NAC attains better or comparable performance on both tasks over recent baselines including SimCLR and DistillHash. In addition, NAC pretraining provides significant benefits to the training of deep generative models. Our code is available at https://github.com/yookoon/nac.

The Logical Options Framework
Brandon Araki MIT, Xiao Li MIT, Kiran Vodrahalli Columbia University, Jonathan DeCastro Toyota Research Institute, Micah Fry MIT Lincoln Laboratory, Daniela Rus MIT CSAIL

Learning composable policies for environments with complex rules and tasks is a challenging problem. We introduce a hierarchical reinforcement learning framework called the Logical Options Framework (LOF) that learns policies that are satisfying, optimal, and composable. LOF efficiently learns policies that satisfy tasks by representing the task as an automaton and integrating it into learning and planning. We provide and prove conditions under which LOF will learn satisfying, optimal policies. And lastly, we show how LOF’s learned policies can be composed to satisfy unseen tasks with only 10-50 retraining steps on our benchmarks. We evaluate LOF on four tasks in discrete and continuous domains, including a 3D pick-and-place environment.

Estimating Identifiable Causal Effects on Markov Equivalence Class through Double Machine Learning
Yonghan Jung Columbia University, Jin Tian Columbia University, Elias Bareinboim Columbia University

General methods have been developed for estimating causal effects from observational data under causal assumptions encoded in the form of a causal graph. Most of this literature assumes that the underlying causal graph is completely specified. However, only observational data is available in most practical settings, which means that one can learn at most a Markov equivalence class (MEC) of the underlying causal graph. In this paper, we study the problem of causal estimation from a MEC represented by a partial ancestral graph (PAG), which is learnable from observational data. We develop a general estimator for any identifiable causal effects in a PAG. The result fills a gap for an end-to-end solution to causal inference from observational data to effects estimation. Specifically, we develop a complete identification algorithm that derives an influence function for any identifiable causal effects from PAGs. We then construct a double/debiased machine learning (DML) estimator that is robust to model misspecification and biases in nuisance function estimation, permitting the use of modern machine learning techniques. Simulation results corroborate with the theory.

Environment Inference for Invariant Learning
Elliot Creager University of Toronto, Joern Jacobsen Apple Inc., Richard Zemel Columbia University

Learning models that gracefully handle distribution shifts is central to research on domain generalization, robust optimization, and fairness. A promising formulation is domain-invariant learning, which identifies the key issue of learning which features are domain-specific versus domain-invariant. An important assumption in this area is that the training examples are partitioned into domains'' orenvironments”. Our focus is on the more common setting where such partitions are not provided. We propose EIIL, a general framework for domain-invariant learning that incorporates Environment Inference to directly infer partitions that are maximally informative for downstream Invariant Learning. We show that EIIL outperforms invariant learning methods on the CMNIST benchmark without using environment labels, and significantly outperforms ERM on worst-group performance in the Waterbirds dataset. Finally, we establish connections between EIIL and algorithmic fairness, which enables EIIL to improve accuracy and calibration in a fair prediction problem.

SketchEmbedNet: Learning Novel Concepts by Imitating Drawings
Alex Wang University of Toronto, Mengye Ren University of Toronto, Richard Zemel Columbia University

Sketch drawings capture the salient information of visual concepts. Previous work has shown that neural networks are capable of producing sketches of natural objects drawn from a small number of classes. While earlier approaches focus on generation quality or retrieval, we explore properties of image representations learned by training a model to produce sketches of images. We show that this generative, class-agnostic model produces informative embeddings of images from novel examples, classes, and even novel datasets in a few-shot setting. Additionally, we find that these learned representations exhibit interesting structure and compositionality.

Universal Template for Few-Shot Dataset Generalization
Eleni Triantafillou University of Toronto, Hugo Larochelle Google Brain, Richard Zemel Columbia University, Vincent Dumoulin Google

Few-shot dataset generalization is a challenging variant of the well-studied few-shot classification problem where a diverse training set of several datasets is given, for the purpose of training an adaptable model that can then learn classes from \emph{new datasets} using only a few examples. To this end, we propose to utilize the diverse training set to construct a \emph{universal template}: a partial model that can define a wide array of dataset-specialized models, by plugging in appropriate components. For each new few-shot classification problem, our approach therefore only requires inferring a small number of parameters to insert into the universal template. We design a separate network that produces an initialization of those parameters for each given task, and we then fine-tune its proposed initialization via a few steps of gradient descent. Our approach is more parameter-efficient, scalable and adaptable compared to previous methods, and achieves the state-of-the-art on the challenging Meta-Dataset benchmark.

On Monotonic Linear Interpolation of Neural Network Parameters
James Lucas University of Toronto, Juhan Bae University of Toronto, Michael Zhang University of Toronto, Stanislav Fort Google AI, Richard Zemel Columbia University, Roger Grosse University of Toronto

Linear interpolation between initial neural network parameters and converged parameters after training with stochastic gradient descent (SGD) typically leads to a monotonic decrease in the training objective. This Monotonic Linear Interpolation (MLI) property, first observed by Goodfellow et al. 2014, persists in spite of the non-convex objectives and highly non-linear training dynamics of neural networks. Extending this work, we evaluate several hypotheses for this property that, to our knowledge, have not yet been explored. Using tools from differential geometry, we draw connections between the interpolated paths in function space and the monotonicity of the network — providing sufficient conditions for the MLI property under mean squared error. While the MLI property holds under various settings (e.g., network architectures and learning problems), we show in practice that networks violating the MLI property can be produced systematically, by encouraging the weights to move far from initialization. The MLI property raises important questions about the loss landscape geometry of neural networks and highlights the need to further study their global properties.

A Computational Framework For Slang Generation
Zhewei Sun University of Toronto, Richard Zemel Columbia University, Yang Xu University of Toronto

Slang is a common type of informal language, but its flexible nature and paucity of data resources present challenges for existing natural language systems. We take an initial step toward machine generation of slang by developing a framework that models the speaker’s word choice in slang context. Our framework encodes novel slang meaning by relating the conventional and slang senses of a word while incorporating syntactic and contextual knowledge in slang usage. We construct the framework using a combination of probabilistic inference and neural contrastive learning. We perform rigorous evaluations on three slang dictionaries and show that our approach not only outperforms state-of-the-art language models, but also better predicts the historical emergence of slang word usages from 1960s to 2000s. We interpret the proposed models and find that the contrastively learned semantic space is sensitive to the similarities between slang and conventional senses of words. Our work creates opportunities for the automated generation and interpretation of informal language.

Wandering Within A World: Online Contextualized Few-Shot Learning
Mengye Ren University of Toronto, Michael Iuzzolino Google Research, Michael Mozer Google Research, Richard Zemel Columbia University

We aim to bridge the gap between typical human and machine-learning environments by extending the standard framework of few-shot learning to an online, continual setting. In this setting, episodes do not have separate training and testing phases, and instead models are evaluated online while learning novel classes. As in the real world, where the presence of spatiotemporal context helps us retrieve learned skills in the past, our online few-shot learning setting also features an underlying context that changes throughout time. Object classes are correlated within a context and inferring the correct context can lead to better performance. Building upon this setting, we propose a new few-shot learning dataset based on large scale indoor imagery that mimics the visual experience of an agent wandering within a world. Furthermore, we convert popular few-shot learning approaches into online versions and we also propose a new contextual prototypical memory model that can make use of spatiotemporal contextual information from the recent past.

Bayesian Few-Shot Classification With One-Vs-Each Polya-Gamma Augmented Gaussian Processes
Jake Snell University of Toronto, Richard Zemel Columbia University

Few-shot classification (FSC), the task of adapting a classifier to unseen classes given a small labeled dataset, is an important step on the path toward human-like machine learning. Bayesian methods are well-suited to tackling the fundamental issue of overfitting in the few-shot scenario because they allow practitioners to specify prior beliefs and update those beliefs in light of observed data. Contemporary approaches to Bayesian few-shot classification maintain a posterior distribution over model parameters, which is slow and requires storage that scales with model size. Instead, we propose a Gaussian process classifier based on a novel combination of Pólya-Gamma augmentation and the one-vs-each softmax approximation that allows us to efficiently marginalize over functions rather than model parameters. We demonstrate improved accuracy and uncertainty quantification on both standard few-shot classification benchmarks and few-shot domain transfer tasks.

Theoretical Bounds On Estimation Error For Meta-Learning
James Lucas University of Toronto, Mengye Ren University of Toronto, Irene Kameni African Master for Mathematical Sciences, Toni Pitassi Columbia University, Richard Zemel Columbia University

Machine learning models have traditionally been developed under the assumption that the training and test distributions match exactly. However, recent success in few-shot learning and related problems are encouraging signs that these models can be adapted to more realistic settings where train and test distributions differ. Unfortunately, there is severely limited theoretical support for these algorithms and little is known about the difficulty of these problems. In this work, we provide novel information-theoretic lower-bounds on minimax rates of convergence for algorithms that are trained on data from multiple sources and tested on novel data. Our bounds depend intuitively on the information shared between sources of data, and characterize the difficulty of learning in this setting for arbitrary algorithms. We demonstrate these bounds on a hierarchical Bayesian model of meta-learning, computing both upper and lower bounds on parameter estimation via maximum-a-posteriori inference.

A PAC-Bayesian Approach To Generalization Bounds For Graph Neural Networks
Renjie Liao University of Toronto, Raquel Urtasun University of Toronto, Richard Zemel Columbia University

In this paper, we derive generalization bounds for the two primary classes of graph neural networks (GNNs), namely graph convolutional networks (GCNs) and message passing GNNs (MPGNNs), via a PAC-Bayesian approach. Our result reveals that the maximum node degree and spectral norm of the weights govern the generalization bounds of both models. We also show that our bound for GCNs is a natural generalization of the results developed in arXiv:1707.09564v2 [cs.LG] for fully-connected and convolutional neural networks. For message passing GNNs, our PAC-Bayes bound improves over the Rademacher complexity based bound in arXiv:2002.06157v1 [cs.LG], showing a tighter dependency on the maximum node degree and the maximum hidden dimension. The key ingredients of our proofs are a perturbation analysis of GNNs and the generalization of PAC-Bayes analysis to non-homogeneous GNNs. We perform an empirical study on several real-world graph datasets and verify that our PAC-Bayes bound is tighter than others.

Shuran Song named 2021 Microsoft Research Faculty Fellow

Assistant Professor Shuran Song has won a 2021 Microsoft Research Faculty Fellowship. The fellowship recognizes innovative, promising new faculty whose exceptional talent for innovation identifies them as emerging leaders in their fields.

Xi Chen Awarded the Gödel Prize

Congratulations to the Class of 2021

The department is extremely proud of all of our students and would like to honor this year’s graduates! We look forward to when we can all come together and celebrate in person.

First, a tribute from our faculty!

During Class Day, awards were given to students who excelled in academics, students with independent research projects, and to those with outstanding performance in teaching and services to the department. The list of awardees is in this year’s graduation handout.

Professor Augustin Chaintreau received the Janette and Armen Avanessians Diversity Award for outstanding research and leadership in advancing diversity in departmental, school, and university programs at Columbia.

At this year’s commencement, more than 600 students received a computer science degree. Click on the logos to see the CS graduates from each college.

Student spotlight

More memories from the past four years…

Columbia and Barnard students attend the 2019 Grace Hopper Conference in Orlando.
Lydia Chilton is the best professor at Columbia hands down. No one brings energy to the CS department the way that she does. GOAT. - Dillon Hayes
Ciaran Beckford and The Ngo at the PennApps 2017 Hackathon.
Charles Baird coding in the kitchen.
AI Generated Artwork, JR Carneiro CS SEAS '21 and Caroline Lin CS CC '21
Bryan Lei and John Wang at the AP Hackathon in Spring 2018
Audrey Cheng with fellow SWE interns at a Salesforce intern event in Denver, Colorado.
Cryptography with Tal! This class was so fascinating. - Ari Hirsch
Arkadiy Saakyan with friends at the tree lighting.
Anushri Arora took this photo of Olha Maslova while they were working on an Analysis of Algorithms problem set at the DSI lounge in late February 2020 - just before the "Stay at Home" story began.
Anastasia Dmitrienko and fellow TA Sophia Kolak giving a midterm recitation for CS Theory.
Amir Idris with the Columbia Data Science Society 2018
Alexander Peile and Tracy Chen declaring majors
Alexander Cohen on the right with other (non-Columbia) students in San Francisco during an internship.
Alex Horimbere on the day when he arrived at Columbia.
Left to right: Cesar Ramos Medina, Iliana Cantu, Ecenaz Ozmen, Yefri Gaitan, Daniel Garces Botero at Google Games. Where they won the most energetic table. Go red team!
Daniel Hanoch (in black jacket) and co-workers at the Earth Institute.
Yarden Carmeli at the Low (High) Beach
The Operating Systems group Maya, Ecenaz, and Ahmed grinding hard for OS projects.
Tanvi Hisaria, Manav, and Aeshna making the best of the Zoom year!
Shout out to DSA with Professor Bauer because he is almost always online, super approachable, and really makes an effort to help me understand CS concepts in general. - Andrew Molina
Sarah Radway and Sophia Kolak at Hack Harvard...trying to win "most useless hack" and still losing.
Sharon Jian with former and current TAs for UI Design grading together
Payal Chandak exploring the tunnels on campus
Rediet Bekele sneaking a Post Lab 7 nap in AP class (Sorry, Jae 🙁 )
Romy Zilkha studying at Butler Library
Rupal Gupta at Googleplex, where she interned her junior summer, and will be joining full-time.
Sara Bernstein and Rebecca Narin (BC '21), who became good friends after taking COMS 1004 together, as they finished their last class of Freshmen year!
Owen Bishop, Dominic Dyer, Jackson Storey, and Jon Lauer at the park.
Omer Fahri Onder kicking an opponent at the MIT taekwondo competition
Huge thanks to Professor Blaer for an engaging, fun, and interesting course in data structures. The knowledge I gained in this course has been the foundation for my coding abilities and a sturdy rock in my journey to becoming a data science graduate. - Aaron Morrill (with his sister, Haley)
Visiting Giphy as a part of Entrepreneurship Via Exploration with Anne Xie, Jenny Li, Monika Francsics, Sungbin Kim, Jeff Huang, Jordan Ramos, and Nico Molina.
Michael Karasev, Chris Mendell, Rodolfo Raimundo, Nathelie Hager, and Sabrina Selleck. After flying to El Paso during finals week and pulling 3 all-nighters, they were finally able to watch their scientific payload launch on board of Blue Origin's rocket.
Harrison Qu in matrix code.
Left to Right: Kelsey Namba, Serena Tam, David Ji at the Princeton Hackathon.
Lucas Hahn
Manasi Sharma at the GROWTH 2019 Conference at San Diego State University during a coding challenge.
Mariya Delyakova and COMS 1004 classmates a week before COVID hit.
Marwan Salam in Havemeyer 309 - Concluding AP Lecture
Jonathan Sanabria (center in blue) with other members of the Columbia Robotics Club working on a prototype for an underwater ROV.
$Discrete Math got me like...$
Discrete Math got me like...
Jinho Lee on a site tour of Google.
Jin Woo Won and Junyao Shi at the AP hackathon, hacking away as Jae played some music.
Jeffrey Kline - remembering long days in the library.
Malware Analysis is a great class - I love reverse engineering and learning how programs and operating systems work, so this class was particularly fun for me. - Christopher Smith (with his very sleepy pup!)
Samuel Fu's internship memories in 2019
JADE cohort January 2018 at Facebook
Jason Herrera the the Brooklyn Bridge in 2019.
Giovanni Sanchez hanging out with friends at Faculty House.
Gael Zendejas, Vikram Ho, Xiao Lim, and Adam Lin at the AP Hackathon.
Ryan James and fellow classmates in AP
From left: Kelsey Namba, Sarah Leventhal, Serena Tam. Taken in Lerner after a whole evening of working on a lab assignment in Advanced Programming.
Beom Joon Baek with James Valentini during Freshman year.
The Operating Systems group Maya, Ecenaz, and Ahmed grinding hard for OS projects.
Daniel Hanoch (in black jacket) and co-workers at the Earth Institute.
Danny Parrill at the virtual Columbia GS Honor Society induction.
Dorothee Grant and Serena Killion in class when Professor Sethumadhavan took attendance by having the students send a photo of them holding up their UNIs.
Daniel Halmos presenting the architecture for a group project in Topics for Robotic Learning.
Left to right: Cesar Ramos Medina, Iliana Cantu, Ecenaz Ozmen, Yefri Gaitan, Daniel Garces Botero at Google Games. Where they won the most energetic table. Go red team!
Professor Brian Smith is the best! - Rico Pesce