The goal of the kinship verification competition is to determine whether a parent-child, sibling, or grandparent-grandchild relationship exists between two people. It is important in social media applications, forensic investigations, finding missing children, and reuniting families. The team demonstrated high-quality kinship verification by participating in theFG 2021 Recognizing Families in the Wild Challengewhich provides the largest publicly available dataset in the field. Their approach, winning third place in the competition, ensembled models written both by humans and written automatically by OpenAI Codex for the first time.
As part of the unique Deep Learning course curriculum, students get to compete in common task framework competitions, which enable them to test the waters in the real-world while advancing science. This semester Drori and teaching assistants Newman Cheng and Vaibhav Bagri performed feasibility tests on the Kinship Verification Challenge and found it in line with the course goals.
Students used the kinship verification dataset to design, develop, and train deep learning models. Over the course of three weeks, teams worked on and improved their submissions competing with groups from across the world. Drori then worked with the leading teams by using OpenAI Codex to improve the verification models even further. The effort paid off with the students at the top three of the competition leaderboard, claiming victory by quickly writing up their findings.
The winning team is composed of graduate students Junyi Huang (Mathematics), Maxwell Strome (Computer Science), Ian Jenkins (Applied Physics and Math), Parker Williams (Computer Science), Bo Feng (Electrical Engineering), Yaning Wang (Electrical Engineering), and Roman Wang (Computer Science).
“Winning third place in this international challenge is an excellent achievement. The teams used both humans and machines to automatically write the code, which is a first and commendable feat!” said Drori.
The computer science community lost Davide Giri on December 2nd. For the past seven years, he worked in the Systems-Level Design Group under the guidance of Professor Luca Carloni. Friends and colleagues share their thoughts and memories of Davide.
The System-Level Design Group (2019). Front row (left to right) : Luca Piccolboni, Kuan-Lin Chiu, Davide Giri, Jihye Kwon, Maico Cassel. Back Row (left to right) : Paolo Mantovani, Guy Eichler, Luca Carloni, Joseph Zuckerman, Giuseppe Di Guglielmo
Joheen Chakraborty (Astrophysics and Computer Science) won the James B. Willett Education Memorial Scholarship from the Universities Space Research Association.
Researchers from the department presented machine learning and artificial intelligence research at the thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021).
One of the central elements of any causal inference is an object called structural causal model (SCM), which represents a collection of mechanisms and exogenous sources of random variation of the system under investigation (Pearl, 2000). An important property of many kinds of neural networks is universal approximability: the ability to approximate any function to arbitrary precision. Given this property, one may be tempted to surmise that a collection of neural nets is capable of learning any SCM by training on data generated by that SCM. In this paper, we show this is not the case by disentangling the notions of expressivity and learnability. Specifically, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020), which describes the limits of what can be learned from data, still holds for neural models. For instance, an arbitrarily complex and expressive neural net is unable to predict the effects of interventions given observational data alone. Given this result, we introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences. Building on this new class of models, we focus on solving two canonical tasks found in the literature known as causal identification and estimation. Leveraging the neural toolbox, we develop an algorithm that is both sufficient and necessary to determine whether a causal effect can be learned from data (i.e., causal identifiability); it then estimates the effect whenever identifiability holds (causal estimation). Simulations corroborate the proposed approach.
It is common to quantify causal effects with mean values, which, however, may fail to capture significant distribution differences of the outcome under different treatments. We study the problem of estimating the density of the causal effect of a binary treatment on a continuous outcome given a binary instrumental variable in the presence of covariates. Specifically, we consider the local treatment effect, which measures the effect of treatment among those who comply with the assignment under the assumption of monotonicity (only the ones who were offered the treatment take it). We develop two families of methods for this task, kernel-smoothing and model-based approximations — the former smoothes the density by convoluting with a smooth kernel function; the latter projects the density onto a finite-dimensional density class. For both approaches, we derive double/debiased machine learning (DML) based estimators. We study the asymptotic convergence rates of the estimators and show that they are robust to the biases in nuisance function estimation. We illustrate the proposed methods on synthetic data and a real dataset called 401(k).
“Monkey see monkey do” is an age-old adage, referring to naive imitation without a deep understanding of a system’s underlying mechanics. Indeed, if a demonstrator has access to information unavailable to the imitator (monkey), such as a different set of sensors, then no matter how perfectly the imitator models its perceived environment (See), attempting to directly reproduce the demonstrator’s behavior (Do) can lead to poor outcomes. Imitation learning in the presence of a mismatch between demonstrator and imitator has been studied in the literature under the rubric of causal imitation learning (Zhang et. al. 2020), but existing solutions are limited to single-stage decision-making. This paper investigates the problem of causal imitation learning in sequential settings, where the imitator must make multiple decisions per episode. We develop a graphical criterion that is both necessary and sufficient for determining the feasibility of causal imitation, providing conditions when an imitator can match a demonstrator’s performance despite differing capabilities. Finally, we provide an efficient algorithm for determining imitability and corroborate our theory with simulations.
The Ladder of Causation describes three qualitatively different types of activities an agent may be interested in engaging in, namely, seeing (observational), doing (interventional), and imagining (counterfactual) (Pearl and Mackenzie, 2018). The inferential challenge imposed by the causal hierarchy is that data is collected by an agent observing or intervening in a system (layers 1 and 2), while its goal may be to understand what would have happened had it taken a different course of action, contrary to what factually ended up happening (layer 3). While there exists a solid understanding of the conditions under which cross-layer inferences are allowed from observations to interventions, the results are somewhat scarcer when targeting counterfactual quantities. In this paper, we study the identification of nested counterfactuals from an arbitrary combination of observations and experiments. Specifically, building on a more explicit definition of nested counterfactuals, we prove the counterfactual unnesting theorem (CUT), which allows one to map arbitrarily nested counterfactuals to unnested ones. For instance, applications in mediation and fairness analysis usually evoke notions of direct, indirect, and spurious effects, which naturally require nesting. Second, we introduce a sufficient and necessary graphical condition for counterfactual identification from an arbitrary combination of observational and experimental distributions. Lastly, we develop an efficient and complete algorithm for identifying nested counterfactuals; failure of the algorithm returning an expression for a query implies it is not identifiable.
Causal effect identification is concerned with determining whether a causal effect is computable from a combination of qualitative assumptions about the underlying system (e.g., a causal graph) and distributions collected from this system. Many identification algorithms exclusively rely on graphical criteria made of a non-trivial combination of probability axioms, do-calculus, and refined c-factorization (e.g., Lee & Bareinboim, 2020). In a sequence of increasingly sophisticated results, it has been shown how proxy variables can be used to identify certain effects that would not be otherwise recoverable in challenging scenarios through solving matrix equations (e.g., Kuroki & Pearl, 2014; Miao et al., 2018). In this paper, we develop a new causal identification algorithm that utilizes both graphical criteria and matrix equations. Specifically, we first characterize the relationships between certain graphically-driven formulae and matrix multiplications. With such characterizations, we broaden the spectrum of proxy variable-based identification conditions and further propose novel intermediary criteria based on the pseudoinverse of a matrix. Finally, we devise a causal effect identification algorithm, which accepts as input a collection of marginal, conditional, and interventional distributions, integrating enriched matrix-based criteria into a graphical identification approach.
Bayesian decision-making under misspecified priors with applications to meta-learning Max Simchowitz Massachusetts Institute of Technology, Christopher Tosh Columbia University, Akshay Krishnamurthy Microsoft Research NYC, Daniel Hsu Columbia University, Thodoris Lykouris Massachusetts Institute of Technology, Miro Dudik Microsoft Research NYC, Robert E Schapire Microsoft Research NYC
Thompson sampling and other Bayesian sequential decision-making algorithms are among the most popular approaches to tackle explore/exploit trade-offs in (contextual) bandits. The choice of prior in these algorithms offers flexibility to encode domain knowledge but can also lead to poor performance when misspecified. In this paper, we demonstrate that performance degrades gracefully with misspecification. We prove that the expected reward accrued by Thompson sampling (TS) with a misspecified prior differs by at most ~O(H2ϵ)O~(H2ϵ) from TS with a well-specified prior, where ϵ is the total-variation distance between priors and H is the learning horizon. Our bound does not require the prior to have any parametric form. For priors with bounded support, our bound is independent of the cardinality or structure of the action space, and we show that it is tight up to universal constants in the worst case. Building on our sensitivity analysis, we establish generic PAC guarantees for algorithms in the recently studied Bayesian meta-learning setting and derive corollaries for various families of priors. Our results generalize along two axes: (1) they apply to a broader family of Bayesian decision-making algorithms, including a Monte-Carlo implementation of the knowledge gradient algorithm (KG), and (2) they apply to Bayesian POMDPs, the most general Bayesian decision-making setting, encompassing contextual bandits as a special case. Through numerical simulations, we illustrate how prior misspecification and the deployment of one-step look-ahead (KG) can impact the convergence of meta-learning in multi-armed and contextual bandits with structured and correlated priors.
The support vector machine (SVM) and minimum Euclidean norm least squares regression are two fundamentally different approaches to fitting linear models, but they have recently been connected in models for very high-dimensional data through a phenomenon of support vector proliferation, where every training example used to fit an SVM becomes a support vector. In this paper, we explore the generality of this phenomenon and make the following contributions. First, we prove a super-linear lower bound on the dimension (in terms of sample size) required for support vector proliferation in independent feature models, matching the upper bounds from previous works. We further identify a sharp phase transition in Gaussian feature models, bound the width of this transition, and give experimental support for its universality. Finally, we hypothesize that this phase transition occurs only in much higher-dimensional settings in the ℓ1ℓ1 variant of the SVM, and we present a new geometric characterization of the problem that may elucidate this phenomenon for the general ℓpℓp case.
Category-level object pose estimation aims to find 6D object poses of previously unseen object instances from known categories without access to object CAD models. To reduce the huge amount of pose annotations needed for category-level learning, we propose for the first time a self-supervised learning framework to estimate category-level 6D object pose from single 3D point clouds. During training, our method assumes no ground-truth pose annotations, no CAD models, and no multi-view supervision. The key to our method is to disentangle shape and pose through an invariant shape reconstruction module and an equivariant pose estimation module, empowered by SE(3) equivariant point cloud networks. The invariant shape reconstruction module learns to perform aligned reconstructions, yielding a category-level reference frame without using any annotations. In addition, the equivariant pose estimation module achieves category-level pose estimation accuracy that is comparable to some fully supervised methods. Extensive experiments demonstrate the effectiveness of our approach on both complete and partial depth point clouds from the ModelNet40 benchmark, and on real depth point clouds from the NOCS-REAL 275 dataset. The project page with code and visualizations can be found at: dragonlong.github.io/equi-pose.
Variational autoencoders model high-dimensional data by posting low-dimensional latent variables that are mapped through a flexible distribution parametrized by a neural network. Unfortunately, variational autoencoders often suffer from posterior collapse: the posterior of the latent variables is equal to its prior, rendering the variational autoencoder useless as a means to produce meaningful representations. Existing approaches to posterior collapse often attribute it to the use of neural networks or optimization issues due to variational approximation. In this paper, we consider posterior collapse as a problem of latent variable non-identifiability. We prove that the posterior collapses if and only if the latent variables are non-identifiable in the generative model. This fact implies that posterior collapse is not a phenomenon specific to the use of flexible distributions or approximate inference. Rather, it can occur in classical probabilistic models even with exact inference, which we also demonstrate. Based on these results, we propose a class of latent-identifiable variational autoencoders, deep generative models which enforce identifiability without sacrificing flexibility. This model class resolves the problem of latent variable non-identifiability by leveraging bijective Brenier maps and parameterizing them with input convex neural networks, without special variational inference objectives or optimization tricks. Acrosssynthetic and real datasets, latent-identifiable variational autoencoders outperform existing methods in mitigating posterior collapse and providing meaningful representations of the data.
We present a framework for learning multimodal representations from unlabeled data using convolution-free Transformer architectures. Specifically, our Video-Audio-Text Transformer (VATT) takes raw signals as inputs and extracts multimodal representations that are rich enough to benefit a variety of downstream tasks. We train VATT end-to-end from scratch using multimodal contrastive losses and evaluate its performance by the downstream tasks of video action recognition, audio event classification, image classification, and text-to-video retrieval. Furthermore, we study a modality-agnostic single-backbone Transformer by sharing weights among the three modalities. We show that the convolution-free VATT outperforms state-of-the-art ConvNet-based architectures in the downstream tasks. Especially, VATT’s vision Transformer achieves the top-1 accuracy of 82.1% on Kinetics-400, 83.6% on Kinetics-600, 72.7% on Kinetics-700, and 41.1% on Moments in Time, new records while avoiding supervised pre-training. Transferring to image classification leads to 78.7% top-1 accuracy on ImageNet compared to 64.7% by training the same Transformer from scratch, showing the generalizability of our model despite the domain gap between videos and images. VATT’s audio Transformer also sets a new record on waveform-based audio event recognition by achieving the mAP of 39.4% on AudioSet without any supervised pre-training.
Deep learning systems frequently fail at out-of-context (OOC) prediction, the problem of making reliable predictions on uncommon or unusual inputs or subgroups of the training distribution. To this end, a number of benchmarks for measuring OOC performance have been recently introduced. In this work, we introduce a framework unifying the literature on OOC performance measurement, and demonstrate how rich auxiliary information can be leveraged to identify candidate sets of OOC examples in existing datasets. We present NOOCh: a suite of naturally-occurring “challenge sets”, and show how varying notions of context can be used to probe specific OOC failure modes. Experimentally, we explore the tradeoffs between various learning approaches on these challenge sets and demonstrate how the choices made in designing OOC benchmarks can yield varying conclusions.
Variational Model Inversion Attacks Kuan-Chieh Wang University of Toronto, YAN FU University of Toronto, Ke Li Simon Fraser University, Ashish Khisti University of Toronto, Richard Zemel Columbia University, Alireza Makhzani University of Toronto
Given the ubiquity of deep neural networks, it is important that these models do not reveal information about sensitive data that they have been trained on. In model inversion attacks, a malicious user attempts to recover the private dataset used to train a supervised neural network. A successful model inversion attack should generate realistic and diverse samples that accurately describe each of the classes in the private dataset. In this work, we provide a probabilistic interpretation of model inversion attacks, and formulate a variational objective that accounts for both diversity and accuracy. In order to optimize this variational objective, we choose a variational family defined in the code space of a deep generative model, trained on a public auxiliary dataset that shares some structural similarity with the target dataset. Empirically, our method substantially improves performance in terms of target attack accuracy, sample realism, and diversity on datasets of faces and chest X-ray images.
Workshop
Causal Inference & Machine Learning: Why now? Elias Bareinboim Columbia University, Bernhard Schölkopf Columbia University, Terrence Sejnowski Salk Institute, Yoshua Bengio University of Montreal, Judea Pearl University of California Los Angeles
Machine Learning has been extremely successful throughout many critical areas, including computer vision, natural language processing, and game-playing. Still, a growing segment of the machine learning community recognizes that there are still fundamental pieces missing from the AI puzzle, among them causal inference.
This recognition comes from the observation that even though causality is a central component found throughout the sciences, engineering, and many other aspects of human cognition, explicit reference to causal relationships is largely missing in current learning systems. This entails a new goal of integrating causal inference and machine learning capabilities into the next generation of intelligent systems, thus paving the way towards higher levels of intelligence and human-centric AI. The synergy goes in both directions; causal inference benefitting from machine learning and the other way around. Current machine learning systems lack the ability to leverage the invariances imprinted by the underlying causal mechanisms towards reasoning about generalizability, explainability, interpretability, and robustness. Current causal inference methods, on the other hand, lack the ability to scale up to high-dimensional settings, where current machine learning systems excel.
The goal of this workshop is to bring together researchers from both camps to initiate principled discussions about the integration of causal reasoning and machine learning perspectives to help tackle the challenging AI tasks of the coming decades. We welcome researchers from all relevant disciplines, including but not limited to computer science, cognitive science, robotics, mathematics, statistics, physics, and philosophy.
The Distinguished Lecture series brings computer scientists to Columbia to discuss current issues and research that are affecting their particular fields.
This year, four experts covered topics on how machine learning is used in drug discovery, software testing, RNA splicing, and surrogate loss functions:
Regina Barzilay, MIT Modeling Chemistry for Drug Discovery: Current State and Unsolved Challenges
Koushik Sen, UC Berkeley Automated Test Generation: A Journey from Symbolic Execution to Smart Fuzzing and Beyond
Oded Regev, Courant Institute, New York University Using Machine Learning for Scientific Discovery in Biology
Shivani Agarwal, University of Pennsylvania Surrogate Loss Functions in Machine Learning: What are the Fundamental Design Principles?
Below are a couple of the lectures from prominent faculty from universities across the country.
Columbia Engineering professor Henning Schulzrinne unpacks President Biden’s $1 trillion infrastructure bill and its promise to expand broadband access for people in rural and low-income areas.
Computer science research is about solving problems with computational tools — it could be how to predict where the next flu outbreak will occur, how robots can make life easier for senior citizens, or how to fight misinformation on social media. But while computer science (CS) researchers have all the technical know-how they still need to collaborate with people who are on the ground and know about the particular problem or situation.
A group of graduate students from various institutions and disciplines (CS, Economics, and Operations Research, to name a few) recognized the gap and need for connections and collaboration between the different groups. And so, Mechanism Design for Social Good (MD4SG) was born in 2016, co-founded by Rediet Abebe and Kira Goldner. From a 12-member reading group, the multi-institutional initiative expanded to 2,000 participants involved in working groups, colloquium series, tutorials, and workshops at the ACM Conference on Economics and Computation, at EC’17 and EC ’18.
The conference highlighted research where CS, economics, operations research, and social and humanistic sciences intersect and help improve equity and access for historically disadvantaged and underserved communities. A number of Best Paper and Poster Awards were presented at the digital conference.
Ana-Andreea Stoica
We caught up with Ana-Andreea Stoica to find out more about the conference and why it is important to develop multi-disciplinary research opportunities.
What happened to make you realize that the MD4SG workshops could be expanded into a conference? How did the EAAMO conference come about?
Our technical workshop series has been increasingly growing since its first iteration in 2017. In 2020, we had the first standalone workshop that drew over 130 submissions. Given the rapid expansion as well as the expanded scope, we decided to start this conference series that would provide a better inclusion of all fields relevant to our mission of bridging research and practice for the scope of improving access to opportunity for marginalized communities (e.g. Economics, Operations Research, Computer Science, Sociology, Law). Rediet Abebe, Irene Lo, and I served as Program Co-Chairs for this inaugural conference, working closely with our General Co-Chairs, Illenin Kondo, and Francisco Marmolejo-Cossio, in organizing the first EAAMO conference.
How is the conference different from the MD4SG workshops?
The conference series is a natural continuation of the MD4SG workshop series (given the growth in size and scope since its inception). The conference aims to be inclusive of all the fields that create research related to the mission of our organization, including Economics, Operations Research, Computer Science, Sociology, Law, among others. The conference would also serve as a publishing venue for such research — as an ACM-sponsored conference, our archival track includes papers published with proceedings in the ACM Digital Library.
How is the conference creating a space for publishing research that relates to your mission?
EAAMO’21 aims to open avenues for creating and sharing research at the intersection of all the fields I mentioned through both the archival and non-archival tracks. In particular, original research can be published in the ACM Digital Library, where it can be recognized and shared in the research community. We hope that EAAMO can serve our community as a space for interdisciplinary research, in particular for the unique ideas and projects that aim to apply computational tools and humanistic methodologies in improving access to opportunities for marginalized groups.
Why does the group aim to connect computer scientists with other non-computational groups such as non-profits and the public sector?
EAAMO’21 aims to foster an interdisciplinary community that can bridge research and practice in tackling topics such as access to education and healthcare, interventions for poverty alleviation, fairness and privacy in labor markets or data markets, and many other topics related to underserved communities.
To this end, working with non-profits, the public sector, and practitioners is crucial in order to understand the main issues at stake in each of these applications and to construct research-to-practice pipelines that have an impact on the communities we aim to center at the core of our research agenda. The success of our workshop series and previous and ongoing projects relies on this multi-disciplinary approach and on engaging domain experts working in non-profit organizations, municipalities, and companies. Domain-centered interdisciplinary work has always been the focus of MD4SG activities.
Since its inception, MD4SG has organized various working groups in which students, researchers, and practitioners work on particular topics of interest. Ourcurrent working groups vary from 15 to 100+ people in size each and organize bi-weekly meetings with talks, discussions, and publication goals. Our groups have fostered cross-domain collaborations that led to several publications. As of Fall 2020, MD4SG has also organized working groups around specific geographical regions to foster collaborations on topics of relevance related to mechanism design for social good.
How will the conference facilitate these collaborations?
EAAMO’21 featuredkeynote talks from leading academics and practitioners in domains related to the conference theme,presentations of submitted papers, problem pitches, datasets, and software demonstrations by participants, problem pitches and product demonstrations from domain experts and practitioners, as well as thematicpolicy & practice discussion panels with practitioners focused on Latin American topics and migration and asylees topics.
Are you working on any projects that resulted from the MD4SG workshops and EAAMO? Please describe it and how is it going?
Definitely, our working groups are continuously working on projects that stemmed from our work together in MD4SG as well as from the MD4SG workshops. A recent paper that came out of the MD4SG Working Group on Bias and Discrimination can be found here. Other projects currently ongoing are related to provisions for social goods (in the Inequality Working Group for example). My co-organizers have several projects published and ongoing, for example, from the Data Economies Working Group, found on this page.
How can people become part of MD4SG?
We encourage people who are interested in joining MD4SG to subscribe to our (low volume)listserv, where we post opportunities to join working groups, events, collaborations, and related activities. Ourwebsite contains a detailed description of all of our activities as well.
Professor Henning Schulzrinne unpacks the infrastructure bill and how it will expand broadband access for Americans.
The $1 trillion dollar Bipartisan Infrastructure Deal will deliver $65 billion dollars to improve internet access. Currently, 30 million Americans live in areas where broadband internet is not available like in rural areas and lower-income urban areas. The plan is to build broadband networks, improve internet infrastructure, and help lower internet service cost. We asked Henning Schulzrinne, the Julian Clarence Levi Professor of computer science and an internet networks expert, how the bill will impact internet access.
Q: What is the current state of internet access in the US? Why is it important that the bill allots $65 billion to improve access for rural areas, low-income families, and tribal communities?
Internet access has two facets: availability and adoption. Currently, there is no precise data of how many homes have access to basic fixed internet service, defined as a speed of 25 megabits download. (That is much slower than most current cable or fiber offerings.) A recent effort using a variety of data sources estimates that about 93%of households could subscribe to basic broadband or faster, leaving about 14.3 million households without access except via expensive and unreliable satellite service or very slow DSL. But only about 77% of adults use the internet at home (“adoption”). Affordability is an important reason for the discrepancy between availability and adoption.
The bill is the first large-scale effort to address both availability and adoption; earlier efforts largely provided money to rural areas to build out broadband internet, about $5B a year, but did not address affordability except for the emergency broadband benefit program started in May 2021.
Q: How far behind is the US when it comes to broadband compared to the rest of the world?
Almost all large countries struggle with making high-speed broadband available to rural areas. But many other countries have lower prices and more competition for broadband service, maybe explaining why the United States ranks 16th out of 38 OECD countries. The United States ranks 13th worldwide on average broadband speed, but such comparisons can be difficult and the differences are not that large among the top 20 countries.
Q: Why is broadband not available in most rural areas?
Most rural areas have some broadband, typically using older technology based on phone lines (DSL = digital subscriber line). However, it can be quite slow and connections are often overloaded and unreliable. Only about 67% of rural households have access to higher speeds of 100 Mb/s that are typical in urban areas. The reasons are complex: cable companies provide most high-speed broadband in the United States, but have largely chosen not to build out in rural areas. Telephone companies have relied on their old phone lines to provide broadband service, with limited investment in modern fiber technology. Since houses are further apart and since disposable incomes are often lower, private investment in rural broadband has not been considered sufficiently profitable; thus, much of the rural broadband deployment has been subsidized by various federal programs. Many of these programs have been supporting broadband that is now considered obsolete.
Q: It also contains $1 billion for enabling the build-out of “middle mile” broadband infrastructure, what is this and how can it help?
The internet infrastructure can be roughly divided into the backbone network connecting major cities, middle mile networks going from those cities to smaller population centers such as county seats, and access or “last-mile” networks that connect homes to the internet. Many smaller communities do not have good fiber connections, or have only one expensive provider. Adding more regional middle mile networks allow smaller network operators to build out access networks, as such small operators cannot afford to build their own fiber network to the next large city.
Q: The bill offers an additional $2.75 billion for digital equity and inclusion efforts, which could end digital redlining. What is redlining? Do you think the bill can help with the issue?
Providers in urban areas have been accused of failing to upgrade slower broadband networks in lower-income urban areas. Competitors such as fiber providers often don’t build out new networks in such areas, either. The lower speeds and higher prices for such neighborhoods are referred to as digital redlining. It is not quite clear yet what kind of projects will be funded. There are promising ideas of providing free Wi-Fi in lower-income apartment buildings, for example.
Q: It seems that the pandemic has been helpful in revealing how inadequate broadband service is in the US. Can you talk about the key findings of your NSF Broadband Research 2020 Report and if the infrastructure bill will actually help achieve those goals?
The NSF Broadband Research Report emphasizes the need to consider measuring and addressing both availability and adoption, including providing training and devices, so the infrastructure bill offers many of the tools envisioned in the report. However, the report is largely about research questions and recommendations for facilitating such research, not policy mechanisms. Even with new, substantial funding, we have to make sure that the programs are effective and reach the right people. For example, the report recommends that all broadband agencies gather and release data as these programs are initiated so that we can learn from successes.
Q: How happy are you with the infrastructure bill? Do you think that it will help fast track the broadband situation in the US? Prior to its passing, how long do you think it would have taken the US to catch up?
The bill is really the first large-scale, all-in, and comprehensive attempt to finally address broadband availability and affordability. It is both a visionary and necessary step towards digital inclusion. My main concern is implementation and coordination. For example, the bill relies on private entities, from for-profit companies to electric cooperatives, to deploy broadband, but cannot force companies to build out everywhere or use the best long-term technology. Grants are made to states who may not have the institutional capacity to ensure that the most efficient organizations build out networks that will still be sufficient to meet local broadband needs 20 years from now. We want to avoid having to spend another few ten billion dollars of taxpayer money ten years from now, after all.
Since the effort is very state-centric, making it possible for researchers and public interest organizations to monitor and evaluate the build-out and digital inclusion efforts will be challenging. (My research group is currently attempting to analyze the existing, much smaller, subsidy efforts, run by two federal agencies, and finding it quite challenging to get a good picture of the impact.)
Q: What are the positive effects that you see will come out of this effort?
Broadband has become a must-have infrastructure for any community, just like clean water or reliable electricity. For education, universal broadband will make it much easier to provide the same learning experience to everyone. Right now, teachers often cannot assign projects or homework that relies on internet resources since not all students have easy access. Continuing education and training will become a bit easier for adults looking to gain new skills. Rural areas lack access to specialists and mental health resources; telemedicine can bridge at least some of these gaps. Some rural areas located within maybe a hundred miles of major cities may be able to attract younger residents who can now work from home and only drive to their office occasionally. Many small businesses need reliable, high-speed internet to offer their goods and services.
That said, I would not expect to fix all societal challenges – broadband access is necessary and even helpful for education, health care and public services, but it is not a replacement for providing high-quality education, health care, and public services more generally.
Abstract: High-velocity dynamic actions (e.g., fling or throw) play a crucial role in our everyday interaction with deformable objects by improving our efficiency and effectively expanding our physical reach range. Yet, most prior works have tackled cloth manipulation using exclusively single-arm quasi-static actions, which requires a large number of interactions for challenging initial cloth configurations and strictly limits the maximum cloth size by the robot’s reach range. In this work, we demonstrate the effectiveness of dynamic flinging actions for cloth unfolding with our proposed self-supervised learning framework, FlingBot. Our approach learns how to unfold a piece of fabric from arbitrary initial configurations using a pick, stretch, and fling primitive for a dual-arm setup from visual observations. The final system achieves over 80% coverage within 3 actions on novel cloths, can unfold cloths larger than the system’s reach range, and generalizes to T-shirts despite being trained on only rectangular cloths. We also finetuned FlingBot on a real-world dual-arm robot platform, where it increased the cloth coverage over 4 times more than the quasi-static baseline did. The simplicity of FlingBot combined with its superior performance over quasi-static baselines demonstrates the effectiveness of dynamic actions for deformable object manipulation.
Abstract: Robots should be able to report in natural language what they have done. They should provide concise summaries, respond to questions about them, and be able to learn from the natural language responses they receive to their summaries. We propose that developing the capabilities for robots to summarize their actions is a new and necessary challenge that should be taken up by the robotic learning community. We propose an initial framework for robot action summarization, presented as a set of tasks that can serve as a target for research and a measure of progress.
Abstract: Interacting with bins and containers is a fundamental task in robotics, making state estimation of the objects inside the bin critical.
While robots often use cameras for state estimation, the visual modality is not always ideal due to occlusions and poor illumination. We introduce The Boombox, a container that uses sound to estimate the state of the contents inside a box. Based on the observation that the collision between objects and their containers will cause an acoustic vibration, we present a convolutional network for learning to reconstruct visual scenes. Although we use low-cost and low-power contact microphones to detect the vibrations, our results show that learning from multimodal data enables state estimation from affordable audio sensors. Due to the many ways that robots use containers, we believe the box will have a number of applications in robotics.
Papers from CS researchers were accepted to the Empirical Methods in Natural Language Processing (EMNLP) 2021. The Best Short Paper Award was also awarded to a paper from the Spoken Language Processing Group.
Humor detection has gained attention in recent years due to the desire to understand user-generated content with figurative language. However, substantial individual and cultural differences in humor perception make it very difficult to collect a large-scale humor dataset with reliable humor labels. We propose CHoRaL, a framework to generate perceived humor labels on Facebook posts, using the naturally available user reactions to these posts with no manual annotation needed. CHoRaL provides both binary labels and continuous scores of humor and non-humor. We present the largest dataset to date with labeled humor on 785K posts related to COVID-19. Additionally, we analyze the expression of COVID-related humor in social media by extracting lexico-semantic and affective features from the posts, and build humor detection models with performance similar to humans. CHoRaL enables the development of large-scale humor detection models on any topic and opens a new path to the study of humor on social media.
Dialogue summarization comes with its own peculiar challenges as opposed to news or scientific articles summarization. In this work, we explore four different challenges of the task: handling and differentiating parts of the dialogue belonging to multiple speakers, negation understanding, reasoning about the situation, and informal language understanding. Using a pretrained sequence-to-sequence language model, we explore speaker name substitution, negation scope highlighting, multi-task learning with relevant tasks, and pretraining on in-domain data. Our experiments show that our proposed techniques indeed improve summarization performance, outperforming strong baselines.
Timeline Summarization identifies major events from a news collection and describes them following temporal order, with key dates tagged. Previous methods generally generate summaries separately for each date after they determine the key dates of events. These methods overlook the events’ intra-structures (arguments) and inter-structures (event-event connections). Following a different route, we propose to represent the news articles as an event-graph, thus the summarization task becomes compressing the whole graph to its salient sub-graph. The key hypothesis is that the events connected through shared arguments and temporal order depict the skeleton of a timeline, containing events that are semantically related, structurally salient, and temporally coherent in the global event graph. A time-aware optimal transport distance is then introduced for learning the compression model in an unsupervised manner. We show that our approach significantly improves the state of the art on three real-world datasets, including two public standard benchmarks and our newly collected Timeline100 dataset.
Despite constant improvements in machine translation quality, automatic poetry translation remains a challenging problem due to the lack of open-sourced parallel poetic corpora, and to the intrinsic complexities involved in preserving the semantics, style and figurative nature of poetry. We present an empirical investigation for poetry translation along several dimensions: 1) size and style of training data (poetic vs. non-poetic), including a zeroshot setup; 2) bilingual vs. multilingual learning; and 3) language-family-specific models vs. mixed-language-family models. To accomplish this, we contribute a parallel dataset of poetry translations for several language pairs. Our results show that multilingual fine-tuning on poetic text significantly outperforms multilingual fine-tuning on non-poetic text that is 35X larger in size, both in terms of automatic metrics (BLEU, BERTScore, COMET) and human evaluation metrics such as faithfulness (meaning and poetic style). Moreover, multilingual fine-tuning on poetic data outperforms bilingual fine-tuning on poetic data.
Enthymemes are defined as arguments where a premise or conclusion is left implicit. We tackle the task of generating the implicit premise in an enthymeme, which requires not only an understanding of the stated conclusion and premise, but also additional inferences that could depend on commonsense knowledge. The largest available dataset for enthymemes (Habernal et al., 2018) consists of 1.7k samples, which is not large enough to train a neural text generation model. To address this issue, we take advantage of a similar task and dataset: Abductive reasoning in narrative text (Bhagavatula et al., 2020). However, we show that simply using a state-of-the-art seq2seq model fine-tuned on this data might not generate meaningful implicit premises associated with the given enthymemes. We demonstrate that encoding discourse-aware commonsense during fine-tuning improves the quality of the generated implicit premises and outperforms all other baselines both in automatic and human evaluations on three different datasets.
Practical dialogue systems require robust methods of detecting out-of-scope (OOS) utterances to avoid conversational breakdowns and related failure modes. Directly training a model with labeled OOS examples yields reasonable performance, but obtaining such data is a resource-intensive process. To tackle this limited-data problem, previous methods focus on better modeling the distribution of in-scope (INS) examples. We introduce GOLD as an orthogonal technique that augments existing data to train better OOS detectors operating in low-data regimes. GOLD generates pseudo-labeled candidates using samples from an auxiliary dataset and keeps only the most beneficial candidates for training through a novel filtering mechanism. In experiments across three target benchmarks, the top GOLD model outperforms all existing methods on all key metrics, achieving relative gains of 52.4%, 48.9% and 50.3% against median baseline performance. We also analyze the unique properties of OOS data to identify key factors for optimally applying our proposed method.
Continual learning in task-oriented dialogue systems can allow us to add new domains and functionalities through time without incurring the high cost of a whole system retraining. In this paper, we propose a continual learning benchmark for task-oriented dialogue systems with 37 domains to be learned continuously in four settings, such as intent recognition, state tracking, natural language generation, and end-to-end. Moreover, we implement and compare multiple existing continual learning baselines, and we propose a simple yet effective architectural method based on residual adapters. Our experiments demonstrate that the proposed architectural method and a simple replay-based strategy perform comparably well but they both achieve inferior performance to the multi-task learning baseline, in where all the data are shown at once, showing that continual learning in task-oriented dialogue systems is a challenging task. Furthermore, we reveal several trade-off between different continual learning methods in term of parameter usage and memory size, which are important in the design of a task-oriented dialogue system. The proposed benchmark is released together with several baselines to promote more research in this direction.
Zero-shot transfer learning for dialogue state tracking (DST) enables us to handle a variety of task-oriented dialogue domains without the expense of collecting in-domain data. In this work, we propose to transfer the crosstask knowledge from general question answering (QA) corpora for the zero-shot DST task. Specifically, we propose TransferQA, a transferable generative QA model that seamlessly combines extractive QA and multichoice QA via a text-to-text transformer framework, and tracks both categorical slots and non-categorical slots in DST. In addition, we introduce two effective ways to construct unanswerable questions, namely, negative question sampling and context truncation, which enable our model to handle “none” value slots in the zero-shot DST setting. The extensive experiments show that our approaches substantially improve the existing zero-shot and few-shot results on MultiWoz. Moreover, compared to the fully trained baseline on the Schema-Guided Dialogue dataset, our approach shows better generalization ability in unseen domains.Zero-shot transfer learning for dialogue state tracking (DST) enables us to handle a variety of task-oriented dialogue domains without the expense of collecting in-domain data. In this work, we propose to transfer the crosstask knowledge from general question answering (QA) corpora for the zero-shot DST task. Specifically, we propose TransferQA, a transferable generative QA model that seamlessly combines extractive QA and multichoice QA via a text-to-text transformer framework, and tracks both categorical slots and non-categorical slots in DST. In addition, we introduce two effective ways to construct unanswerable questions, namely, negative question sampling and context truncation, which enable our model to handle “none” value slots in the zero-shot DST setting. The extensive experiments show that our approaches substantially improve the existing zero-shot and few-shot results on MultiWoz. Moreover, compared to the fully trained baseline on the Schema-Guided Dialogue dataset, our approach shows better generalization ability in unseen domains.
Despite the recent success of large-scale language models on various downstream NLP tasks, the repetition and inconsistency problems still persist in dialogue response generation. Previous approaches have attempted to avoid repetition by penalizing the language model’s undesirable behaviors in the loss function. However, these methods focus on tokenlevel information and can lead to incoherent responses and uninterpretable behaviors. To alleviate these issues, we propose to apply reinforcement learning to refine an MLE-based language model without user simulators, and distill sentence-level information about repetition, inconsistency and task relevance through rewards. In addition, to better accomplish the dialogue task, the model learns from human demonstration to imitate intellectual activities such as persuasion, and selects the most persuasive responses. Experiments show that our model outperforms previous state-of-the-art dialogue models on both automatic metrics and human evaluation results on a donation persuasion task, and generates more diverse, consistent and persuasive conversations according to the user feedback.
Large language models benefit from training with a large amount of unlabeled text, which gives them increasingly fluent and diverse generation capabilities. However, using these models for text generation that takes into account target attributes, such as sentiment polarity or specific topics, remains a challenge. We propose a simple and flexible method for controlling text generation by aligning disentangled attribute representations. In contrast to recent efforts on training a discriminator to perturb the token level distribution for an attribute, we use the same data to learn an alignment function to guide the pre-trained, non-controlled language model to generate texts with the target attribute without changing the original language model parameters. We evaluate our method on sentiment- and topiccontrolled generation, and show large performance gains over previous methods while retaining fluency and diversity.
Recommendation dialogs require the system to build a social bond with users to gain trust and develop affinity in order to increase the chance of a successful recommendation. It is beneficial to divide up, such conversations with multiple subgoals (such as social chat, question answering, recommendation, etc.), so that the system can retrieve appropriate knowledge with better accuracy under different subgoals. In this paper, we propose a unified framework for common knowledge-based multi-subgoal dialog: knowledge-enhanced multi-subgoal driven recommender system (KERS). We first predict a sequence of subgoals and use them to guide the dialog model to select knowledge from a sub-set of existing knowledge graph. We then propose three new mechanisms to filter noisy knowledge and to enhance the inclusion of cleaned knowledge in the dialog response generation process. Experiments show that our method obtains state-of-the-art results on DuRecDial dataset in both automatic and human evaluation.
Meet Henry Yuen, a computer scientist exploring the boundaries between classical and quantum computers. Yuen joined Columbia Engineering as an assistant professor in January 2021.
PhD students will review an applicant’s Personal Statement as part of the Pre-Submission Application Review (PAR) Program.
A group of PhD students wants to reduce the inequities in the department’s PhD application process. They will help applicants of the PhD program – by lending their expertise by reviewing a personal statement. This initiative, called the Pre-Submission Application Review (PAR) Program, is in its second year.
“It is clear that students from underrepresented groups may further benefit from mentorship through the entirety of the process of applying, to deciding, to ultimately entering grad school,” said Sam Fereidooni, a first-year PhD student and PAR Program coordinator. The group plans to organize further mentorship opportunities in future iterations of the program such as spaces where students can engage in conversations in a supportive community of their peers, in addition to current PhD students and faculty members.
“Ultimately, we are trying to provide resources to support underrepresented people in CS, with the goal of addressing inequality in representation,” said Samir Gadre, a 2nd-year PhD student and PAR Program coordinator. The group sees the importance of continuing the program because the status quo does not change quickly. It is a feeling that is shared with other universities – Stanford University and the Massachusetts Institute of Technology students started similar programs in 2020 as well. Said Gadre, “We feel that PAR programs across the country are a good first step. However, we also recognize that more student and faculty activism, particularly from people in positions of power, is necessary to create meaningful institutional change.”
By continuing the program the group hopes to address the systemic disadvantages people from underrepresented communities face by lending a hand and giving advice on how to write a personal statement that will stand out and get the attention of professors.
“Above all applicants must do research on potential faculty that they would like to work with,” said Kahlil Dozier, a 2nd-year PhD student and PAR Program coordinator. Even if an applicant is not completely sure what their intended research area is, it is better to mention specific faculty that may align with their interests in their application. This is one of the most critical pieces of advice; an application will likely get referred to the names mentioned, and those professors may be the ones deciding if the applicant is a suitable candidate for admission.
And it is not enough to just mention the faculty in the application–potential students should actually look at the recent work faculty has done and read their papers. A PhD can take five to seven years to complete so applicants should see if it is the type of work they actually want to dedicate their graduate research career to. Continued Dozier, “If you have done this, it will inevitably come through in your personal statement and bolster your application.”
Here are more points applicants should consider before writing a Personal Statement:
– The Personal Statement is a key part of the application; oftentimes, it is where an applicant can differentiate themself from other applicants
– In short, the intent is to build a personal narrative, goals, and aspirations, and offer a perspective that is fundamentally absent from a resume/CV.
– The application is constrained by limited space, so applicants need to focus on a few concrete experiences (broadly defined) that may have shaped the trajectory of the applicant’s academic career up until this point or even themself as a researcher.
– Even though it is separate and serves a different function than the Research Statement of Purpose, research can still be involved. One approach to making a personal statement is to make a narrative out of one’s CV, fill in the “between the lines”.
– Again, doing prior research on potential faculty can shine through here, and it would be advantageous to show in any way how a faculty member’s work may align with the applicant’s background and goals.
Interested applicants have to apply to the PAR program and submit their personal statement and CV by November 7th at 11:59 pm EST. Because the program is student-run and dependent on volunteers, there is no guarantee that every applicant can be accommodated. Those who are accepted will be notified by November 14th, then paired with a PhD student in the same research area who will review their materials and provide feedback to them by November 21st – well ahead of the December 15th deadline to apply to the PhD program.
Postdoc Adam Li works with Professor Elias Bareinboim in the Causal Artificial Intelligence Lab. The program sponsors two-year postdoctoral research positions in computing for recent Ph.D. graduates.
Misra’s “Ask Here First” offers easy data access for all users and can be applied to any data-rich field, from sports and e-commerce to investment banking.
The small-grants program is designed to support faculty who contribute to the diversity goals of the University by their research, teaching, and mentoring activities.
Associate Professor Martha Kim received a Mid-Career Faculty Grant for her proposal, “Preparing Today’s Hardware for Tomorrow’s Software.” Her project aims to develop new and much needed advances in general purpose computing.
For her project, “Testing Deep Learning Based Autonomous Driving Controllers,” Assistant Professor Baishakhi Ray received a Junior Faculty Grant. Ray will use the funds to create an end-to-end testing framework for self-driving cars.
Professor Jason Nieh is one of 26 awardees that will have access to more than 250 Amazon public datasets and can utilize AWS AI/ML services and tools. Each award is intended to support the work of one to two graduate students or postdoctoral students for one year, under the supervision of a faculty member.
Giannis Karamanolakis, a natural language processing and machine learning PhD student, talks about his research projects and how he is developing machine learning techniques for natural language processing applications.
Sitia, Greece
Can you talk about your background and why you decided to pursue a PhD?
At NTUA, taking part in machine learning (ML) research was not planned but rather a spontaneous outcome stemming from my love for music. The initial goal for my undergraduate thesis was to build an automatic music transcription system that converts polyphonic raw audio into music sheets. However, after realizing that such a system would not be possible to develop in a limited amount of time, I worked on the simpler task of automatically tagging audio clips with descriptive tags (e.g., “car horn” for audio clips where a car horn is sound). Right after submitting a new algorithm as a conference paper, I realized that I love doing ML research.
After NTUA, I spent one and a half years working as an ML engineer at a startup called Behavioral Signals, where we trained statistical models for the recognition of core emotions from speech and text data. After a few months of ML engineering, I found myself spending more time reading research papers and evaluating new research ideas on ML and natural language processing (NLP). By then, I was confident about my decision to pursue a PhD in ML/NLP.
What about NLP did you like and when did you realize that you wanted to do research on it?
I am fascinated by the ability of humans to understand complex natural language. At the moment of writing this response, I submitted the following 10-word query to Google: “when did you realize that you wanted to do research” by keeping quotation marks so that Google looks for exact matches only. Can you guess the number of the documents returned by Google that contain this exact sequence of 10 words?
The answer that I got was 0 (zero) documents, no results! In other words, Google, a company with huge collections of documents, did not detect any document that contains this specific sequence of words. Sentences rarely recur but humans easily understand the semantics of such rare sentences.
I decided to do research on NLP when I realized that current NLP algorithms are far away from human-level language understanding. As an example back from my time at Behavioral Signals, emotion classifiers were misclassifying sentences that contained sarcasm, negation, and other complex linguistic phenomena. I could not directly fix those issues (which are prevalent beyond emotion classification), which initially felt both surprising and frustrating, but then evolved into my excitement for research on NLP.
Why did you apply to Columbia and how was that process?
The computer science department at Columbia was one of my top choices for several reasons, but I will discuss the first one.
I was excited to learn about the joint collaboration between Columbia University and the New York City Department of Health and Mental Hygiene (DOHMH), on a project that aims to understand user-generated textual content in social media (e.g., Yelp reviews, tweets) for critical public health applications, such as detecting and acting on foodborne illness outbreaks in restaurants. I could see that the project would offer the unique opportunity to do research in ML and NLP and at the same time contribute to this important public application in collaboration with epidemiologists at DOHMH. Fortunately, I have been able to work on the project, advised by Professor Luis Gravano and Associate Professor Daniel Hsu.
Applying to Columbia and other American universities was quite a stressful experience. For many months, my days were filled with working for Behavioral Signals, studying hard for high scores in GRE and TOEFL exams (both of which were required at that time by all US universities), creating a short CV for the first time, and writing a distinct statement-of-purpose for each university. I am glad to observe the recent promising changes in the PhD application procedure for our department, such as waiving the GRE requirements and offering the Pre-submission Application Review (PAR) program, in which current PhD students help applicants improve their applications. (Both of which I would have liked to have been able to take advantage of.)
What sort of research questions or issues do you hope to answer?
My research in the past few years focuses on the following question: Can we effectively train ML classifiers for NLP applications with limited training data using alternative forms of human supervision?
An important limitation of current “supervised ML” techniques is that they require large amounts of training data, which is expensive and time-consuming to obtain manually. Thus, while supervised ML techniques (especially deep neural networks) thrive in standard benchmarks, it would be too expensive to apply to emerging real-world applications with limited labeled data.
Our work attempts to address the expensive requirement of manually labeled data through novel frameworks that leverage alternative, less expensive forms of human supervision. In sentiment classification, for example, we allow domain experts to provide a small set of domain-specific rules (e.g., “happy” keyword indicates positive sentiment, “diarrhea” is a symptom of food poisoning). Under low-resource settings with no labeled data, can we leverage expert-defined rules as supervision for training state-of-the-art neural networks?
For your research papers, how did you decide to do research on those topics? How long did it take you to complete the work? Was it easy?
For my first research project at Columbia, my goal was to help epidemiologists in health departments with daily inspections of restaurant reviews that discuss food poisoning events. Restaurant reviews can be quite long, with many irrelevant sentences surrounding the truly important ones that discuss food poisoning or relevant symptoms. Thus, we developed a neural network that highlights only important sentences in potentially long reviews and deployed it for inspections in health departments, where epidemiologists could quickly focus on the relevant sentences and safely ignore the rest.
Each project took about 6 months to complete. None of them were easy; each required substantial effort in reading relevant papers, discussing potential solutions with my advisors, implementing executable code, evaluating hypotheses on real data, and repeating the same process until we were all satisfied with the solutions and evaluation results. The projects also involved meeting with epidemiologists at DOHMH, re-designing our system to satisfy several (strict) data transfer protocols imposed by health departments, and overcoming several issues related to missing data for training ML classifiers.
Your advisors are not part of the NLP group, how has that worked out for you and your projects?
It has worked great in my humble opinion. For the public health project, the expertise of Professor Gravano on information extraction, combined with the expertise of Professor Hsu on machine learning, and the technical needs of the project have contributed without any doubt to the current formulation of our NLP-related frameworks. My advisors’ feedback covers a broad spectrum of research, ranging from core technical challenges to more general research practices, such as problem formulation and paper writing.
Among others, I appreciate the freedom I have been given for exploring new interesting research questions as well as the frequent and insightful feedback that helps me to reframe questions and forming solutions. At the same time, discussions with members of the NLP group, including professors and students, have been invaluable and have clearly influenced our projects.
What do you think is the most interesting thing about doing research?
I think it is the high amount of surprise it encompasses. For many research problems that I have tried to tackle, I started by shaping an initial solution in my mind but in the process discovered surprising findings that undoubtedly changed my way of thinking – such as that my initial solution did not actually work, simpler approaches worked better than more sophisticated approaches, data followed unexpected patterns, etc. These instances of surprise turned research into an interesting experience, similar to solving riddles or listening to jazz music.
Please talk about your internships – the work you did, how was it, what did you learn?
In the summer of 2019, I worked at Amazon’s headquarters in Seattle with a team of more than 15 scientists and engineers. Our goal was to automatically extract and store knowledge about billions of products in a product knowledge graph. As part of my internship, we developed TXtract, a deep neural network that efficiently extracts information from product descriptions for thousands of product categories. TXtract has been a core component of Amazon’s AutoKnow, which provides the collected knowledge for Amazon search and product detail pages.
During the summer of 2020, I worked for Microsoft Research remotely from New York City (because of the pandemic). In collaboration with researchers at the Language and Information Technologies team, we developed a weak supervision framework that enables domain experts to express their knowledge in the form of rules and further integrates rules for training deep neural networks.
These two internships equipped me with invaluable experiences. I learned new coding tools, ML techniques, and research practices. Through the collaboration with different teams, I realized that even researchers who work on the same subfield may think in incredibly different ways, so to carry out a successful collaboration within a limited time, one needs to listen carefully, pre-define expected outcomes (with everyone in the team), and adapt fast.
Do you think your skills were improved by your time at Columbia? In which ways?
Besides having improved my problem-finding and -solving skills, I have expanded my presentation capabilities. In the beginning, I was frustrated when other people (even experienced researchers) could not follow my presentations and I was worried when I could not follow other presenters’ work. Later, I realized that if (at least part of) the audience is not able to follow a presentation, then the presentation is either flawed or has been designed for the wrong audience.
Over the past four years, I have presented my work at various academic conferences and workshops, symposiums at companies, and student seminars, and after having received constructive feedback from other researchers, I can say that my presentation skills have vastly improved. Without any doubt, I feel more confident and can explain my work to a broader type of audience with diverse expertise. That said, I’m still struggling to explain my PhD topic to my family. 🙂
What has been the highlight of your time at Columbia?
The first thing that comes to mind is the “Greek Happy Hour” that I co-organized in October 2019. More than 40 PhD students joined the happy hour, listened to Greek music (mostly “rempetika”), tasted greek specialties (including spanakopita), and all toasted loudly by saying “Γειά μας” (ya mas; the greek version of “cheers”).
Was there anything that was tough to handle while taking your PhD?
PhD Happy Hour in 2020
It is hard to work from home during a pandemic. A core part of my PhD used to involve multi-person collaborations, drawing illustrations on the whiteboards of the Data Science Institute, random chats in hallways, happy hours, and other social events. All these have been harder or impossible to retain during the pandemic. I miss it and look forward to enjoying it again soon.
Looking back, what would you have done differently?
If I could, I would have engaged in more discussions and collaborations, taken more classes, played more music, and slept less. 🙂
What is your advice to students on how to navigate their time at Columbia? If they want to do NLP research what should they know or do to prepare?
They should register for diverse courses; Columbia offers the opportunity to attend courses from multiple departments. They should reach out to as many people as possible and do not hesitate to email graduate students and professors. I love receiving emails from people that I haven’t met before, some of which stimulated creative collaborations.
For those that want to do NLP research (which I highly recommend–subjectively speaking), you should contact me or any person in the NLP group.
What are your plans after Columbia?
I plan to continue working on research, either as a faculty member or in an industry research and development department.
Is there anything else that you think people should know?
Columbia offers free and discounted tickets to museums and performances around New York City, even virtual art events. I personally consider New York as the “state-of-the-art”.
Shomik Ghose (SEAS’23), Tamarah Wallace (CC’22), and Jake Fisher (SEAS’22), alongside alternate Addis Boyd (SEAS’22), brought the team to victory winning 190-150 points against the USC Trojans.
In this Q&A, climate scientist Galen McKinley and computer scientist Carl Vondrick explain how Columbia’s new climate modeling center will improve on the latest projections.
Professor Shree Nayar created the course, which offers an interdisciplinary approach to computer vision. The specialization is designed for learners to acquire the foundational mathematical and physical underpinnings of computer vision.
Columbia Engineering announces the launch of a new MOOC Specialization — First Principles of Computer Vision, offered on Coursera, a leading provider of massive open online courses (MOOCs). The five-course specialization is designed for learners to acquire the foundational mathematical and physical underpinnings of computer vision.
Today, computer vision has emerged as the primary form of artificial intelligence in some of the newest and most exciting technological innovations including driverless cars, medical imaging, optical character recognition, virtual reality, interactive games, and more. The expanding applications of computer vision cement it as an important aspect of the often-changing world of technology and artificial intelligence. As such, this program helps satisfy the rising demand for workers that understand computer vision and its real-world applications, opening learners to the opportunity for jobs, such as computer vision hardware engineers or computer vision researchers.
Professor Shree Nayar
Shree Nayar, T.C. Chang Professor of Computer Science and instructor for this course, describes the implications of computer vision, “The goal of computer vision is to build machines that can see. We’ve already seen very successful applications of vision such as face recognition and driverless cars. There is much more to come. In the next decade, we can expect computer vision to have a profound impact on the way we live our lives.”
First Principles of Computer Vision Specialization consists of five courses covering topics such as Camera and Imaging, Features and Boundaries, 3D Reconstruction Single Viewpoint, 3D Recognition Multiple Viewpoints, and Perception. This specialization offers an interdisciplinary approach to computer vision, giving learners a full-spectrum view of its foundations. As Nayar describes it, “While we approach vision as an engineering discipline in this program, when appropriate, we make connections with other fields such as neuroscience, psychology, art history, and even biology.”
Graduate students from the department have been selected to receive scholarships. The diverse group is a mix of those new to Columbia and students who have received fellowships for the year.
Google Fellowship
The Google PhD Fellowship Program was created to recognize outstanding graduate students doing exceptional and innovative research in areas relevant to computer science and related fields.
Yiru Chen Yiru Chen is a fourth-year Ph.D. student who works with Associate Professor Eugene Wu. Her research interests are database systems, human-computer interaction, and data exploration. Her work focuses on improving database usability by automatically generating database interfaces for interactive data analysis.
Chen graduated from Peking University with a B.S. in computer science summa cum laude and a B.A. in Economics in 2018. She enjoys cycling and playing the violin whenever she has free time.
NSF Graduate Research Fellowship Program (GRFP)
The GRFP is a five-year fellowship that recognizes and supports outstanding graduate students in NSF-supported STEM disciplines who are pursuing research-based master’s and doctoral degrees.
Philippe Chlenski Philippe Chlenski is interested in developing and applying computational techniques to biological problems, particularly machine learning for microbial dynamics. He is a second-year PhD student in the Pe’er lab. Prior to Columbia, he worked for two years at the Fellowship for Interpretation of Genomes at the Argonne National Lab.
Chlenski graduated in 2018 from Yale University with a Bachelor’s degree in mathematics and philosophy. He also holds an Associate’s degree in liberal arts from Deep Springs College.
Sam Fereidooni Sam Fereidooni is interested in investigating semantic representations through the lens of both cognitive neuroscience and natural language processing. He particularly hopes that the eventual findings from his work will lead to ameliorated treatments for those who suffer from language processing and production disorders. He is a first-year PhD student in the Theory group, and he is advised by Professor Christos Papadimitriou.
Fereidooni graduated in 2021 from Yale University with a B.S. in Cognitive Science, and a B.S. in Statistics and Data Science. Sam’s undergraduate studies were supported by the Questbridge Foundation National College Match scholarship, the Richter Undergraduate Research fellowship, and the Yale Club of New York City Charles S. Guggenheimer scholarship.
Shashaank N Shashaank N is a first-year PhD student who will be advised by assistant professor David Knowles. His research interests are in computational genomics and neuroscience, with a focus on auditory processing disorders in the brain.
Shashaank recently graduated with an MS in Computer Science from Columbia University in 2021. He completed a BS in Interdisciplinary Studies from Western Kentucky University (WKU) in 2019 and received the Scholar of the College academic award.
Meghna Pancholi Meghna Pancholi is a second-year PhD student advised by Associate Professor Simha Sethumadhavan. She is interested in cloud computing, systems security, and microservices. Before Columbia, Meghna was an undergraduate researcher at Cornell University where she worked on improving the performance of microservices applications with machine learning techniques.
Meghna graduated from Cornell University in 2020 with a BS in Computer Science.
Clayton Sanford Clayton Sanford is a third-year PhD student working with Professors Rocco Servedio and Daniel Hsu on machine learning theory. The motivating goal of his research is to understand mathematically why deep learning performs so well in practice. Clayton’s work on the approximation capabilities of neural networks has been published at the COLT 2021 conference. He is a member of the CS Theory Group.
Clayton received an ScB in Applied Math and Computer Science with honors from Brown University in 2018.
Sky Wang Sky Wang is an incoming first-year PhD student set to work with Assistant Professors Zhou Yu and Smaranda Muresan. His work focuses on natural language processing and he is interested in leveraging computational methods to understand social aspects of language and to use such insights in creating more effective and more equitable language technologies. He is particularly interested in the areas of situated dialogue systems, computational social science, and cultural analytics.
Wang graduated in 2020 from the University of Michigan with a B.S.E in Computer Science. He is a 2021 recipient of the University of Michigan’s EECS Undergraduate Outstanding Research Award and also received an honorable mention for the Computing Research Association Outstanding Undergraduate Research Award in 2021. He received a Best Poster award from the University of Michigan AI Symposium in 2018 and was recognized as a finalist in the NASA Goddard Space Flight Center Intern Research Fair in 2018.
Joseph Zuckerman Joseph Zuckerman is a second-year PhD student in computer science at Columbia University, where he works in the System-Level Design group, advised by Professor Luca Carloni. His research interests include architectures, runtime management, and agile design methodologies for many-accelerator systems-on-chip.
Zuckerman contributes as one of the main developers to ESP, an open-source research platform for heterogeneous system-on-chip design. In 2019, he completed his S.B in electrical engineering at Harvard University, during which he completed internships at NVIDIA and the NASA Jet Propulsion Lab.
SEAS Fellowships
Columbia School of Engineering and Applied Sciences established the Presidential and SEAS fellowships to recruit outstanding students from around the world to pursue graduate studies at the school.
Blavatnik Fellow
Sebastian Salazar Sebastian Salazar’s research interests include Machine Learning and Ethical AI. At Columbia, his work will be focused on counterfactual predictions and actionability of Machine Learning models. He is a first-year PhD student who will be working under the guidance of Ansaf Salleb-Aouissi.
Sebastian graduated magna cum laude from Columbia University in 2021 with a B.S. in Applied Physics.
Dean’s Fellows
Huy Ha Huy Ha is an incoming first-year PhD student interested in computer vision, natural language processing, and robot learning. His research studies how embodied intelligence could combine information from different modalities (vision, language, interaction) to understand its environment, solve tasks, and assist people. He is advised by Assistant Professor Shuran Song and is a member of the Columbia Artificial Intelligence and Robotics (CAIR) lab.
Ha graduated in 2021with a BS in Computer Science from Columbia University. He was a Dean’s Fellow and received the Theodore Bashkow Award. He did research during the summer as a Bonomi Summer Scholar. During his free time, Ha likes to take photos, rock climb, bike, and train his two border collies for frisbee.
Yun-Yun Tsai A first-year PhD student, Yun-Yun Tsai works with Professor Junfeng Yang. Her research interests are in security and artificial intelligence. In particular, she is interested in improving robustness over neural networks and machine learning (ML) algorithms so that they make fewer mistakes on malicious samples. She will work on research related to making AI applications less fragile against unusual inputs.
Tsai received a B.Sc. and M.Sc. degrees in computer science at National Tsing Hua University (NTHU) Taiwan in 2014 and 2018, respectively. Previously, she was advised by Professor Tsung-Yi Ho and Dr. Pin-Yu Chen from Trusted AI group, IBM Thomas J. Watson Research Center, NY USA.
Mudd Fellow
Anjali Das Anjali Das is a first-year PhD student who works with Professors Itsik Pe’er and David Knowles. Her research interest is in developing and applying machine learning methods to problems in genomics. Specifically, she is interested in the genetics of neurological diseases.
Das graduated from the University of Chicago in June of 2020 with a BS in statistics and a minor in computer science. After graduating, she worked as a data scientist at UChicago’s Research Computing Center before joining Columbia.
Josh Alman, Toniann Pitassi, and Richard Zemel join the department. They will facilitate research and learning in CS theory, machine learning, and artificial intelligence.
Josh Alman Assistant Professor PhD Computer Science, Massachusetts Institute of Technology 2019 MS Computer Science, Stanford University 2016 BS Mathematics, Massachusetts Institute of Technology 2014
Josh Alman is a theoretical computer scientist who works broadly throughout algorithm design and complexity theory. His current research focuses on algorithms for fundamental algebraic problems. In particular, he is interested in how quickly one can multiply matrices and compute important linear transforms like Fourier transforms, and how to apply algebraic tools like these to new problems throughout computer science.
Alman joins the Theory Group and looks forward to working with students who have a background in theoretical computer science or mathematics on various projects. This Fall, he will teach a graduate class on algebraic techniques in algorithms and complexity.
Toniann Pitassi Jeffrey L. and Brenda Bleustein Professor of Engineering PhD Computer Science, University of Toronto 1992 MS Computer Science, Pennsylvania State University 1985 BS Chemistry & Computer Science, Pennsylvania State University 1985
Pitassi’s research area is computational complexity: what are the inherent limitations on the resources (time, space, randomness) required to solve fundamental computational problems? An important direction aimed at resolving the ultimate question of this type, the P versus NP question, is propositional proof complexity, which studies the difficulty of proving tautological statements in standard proof systems. She has also worked extensively in communication complexity which studies how much information must be communicated between two or more players in order to compute a joint function of their inputs. Her other research interests lie in the foundations of machine learning, particularly in the areas of privacy, fairness, and reproducibility.
Previously Pitassi was the Bell Research Chair at the University of Toronto, a Canadian Institute for Advanced Research research chair, and a research lead at the Schwartz-Reisman Institute for Technology and Society. She currently holds a five-year appointment as visiting professor at the Institute for Advanced Study. Pitassi is the 2021 EATCS Award recipient and was an invited speaker at the International Congress of Mathematicians
At Columbia, Pitassi joins the department’s Theory Group and the Machine Learning Group. She is excited to collaborate with new colleagues and graduate students at Columbia and to explore New York. Her hobbies and outside interests are constantly changing with the latest being stained glass.
Richard Zemel Professor of Computer Science PhD Computer Science, University of Toronto 1994 MS Computer Science, University of Toronto 1989 BA History and Science, Harvard University 1984
Richard Zemel’s research focuses on machine learning and artificial intelligence. He is also interested in natural intelligence, including neuroscience and cognitive science. His recent research targets learning with few labels and creating robust and controllable machine learning systems, which can transfer to a variety of tasks and domains. He also has a strong interest in algorithmic fairness.
Previously, Zemel was the NSERC Industrial Research Chair in Machine Learning at the University of Toronto, and co-founded and served as the Research Director of the Vector Institute for Artificial Intelligence. He is the recipient of an ONR Young Investigator Award, an NVIDIA Pioneer of AI Award, and is a Fellow of the Canadian Institute for Advanced Research.
This Fall, he will teach a course on Neural Networks and Deep Learning and will teach a seminar class on machine learning in the Spring term. Zemmel is looking for PhD students who are interested in machine learning to join his research group. In his spare time, he likes sports such as hockey, squash and biking, and eating.
When studying the expressive power of neural networks, a main challenge is to understand how the size and depth of the network affect its ability to approximate real functions. However, not all functions are interesting from a practical viewpoint: functions of interest usually have a polynomially bounded Lipschitz constant, and can be computed efficiently. We call functions that satisfy these conditions “benign” and explore the benefits of size and depth for approximation of benign functions with ReLU networks. As we show, this problem is more challenging than the corresponding problem for non-benign functions. We give complexity-theoretic barriers to showing depth-lower bounds: Proving existence of a benign function that cannot be approximated by polynomial-sized networks of depth 4 would settle longstanding open problems in computational complexity. It implies that beyond depth 4 there is a barrier to showing depth-separation for benign functions, even between networks of constant depth and networks of nonconstant depth. We also study size separation, namely, whether there are benign functions that can be approximated with networks of size O(s(d)), but not with networks of size O(s 0 (d)). We show a complexity-theoretic barrier to proving such results beyond size O(d log2 (d)), but also show an explicit benign function, that can be approximated with networks of size O(d) and not with networks of size o(d/ log d). For approximation in the L∞ sense we achieve such separation already between size O(d) and size o(d). Moreover, we show superpolynomial size lower bounds and barriers to such lower bounds, depending on the assumptions on the function. Our size-separation results rely on an analysis of size lower bounds for Boolean functions, which is of independent interest: We show linear size lower bounds for computing explicit Boolean functions (such as set disjointness) with neural networks and threshold circuits.
We study the problem of learning an unknown mixture of k permutations over n elements, given access to noisy samples drawn from the unknown mixture. We consider a range of different noise models, including natural variants of the “heat kernel” noise framework and the Mallows model. We give an algorithm which, for each of these noise models, learns the unknown mixture to high accuracy under mild assumptions and runs in n O(log k) time. Our approach is based on a new procedure that recovers an unknown mixture of permutations from noisy higher-order marginals.
We study the problems of learning and testing junta distributions on {−1, 1} n with respect to the uniform distribution, where a distribution p is a k-junta if its probability mass function p(x) depends on a subset of at most k variables. The main contribution is an algorithm for finding relevant coordinates in a k-junta distribution with subcube conditioning Bhattacharyya and Chakraborty (2018); Canonne et al. (2019). We give two applications: • An algorithm for learning k-junta distributions with O˜(k/2 ) log n + O(2k/2 ) subcube conditioning queries, and • An algorithm for testing k-junta distributions with O˜((k + √ n)/2 ) subcube conditioning queries. All our algorithms are optimal up to poly-logarithmic factors. Our results show that subcube conditioning, as a natural model for accessing high-dimensional distributions, enables significant savings in learning and testing junta distributions compared to the standard sampling model. This addresses an open question posed by Aliakbarpour et al. (2016).
In this paper, we examine the Nash equilibrium convergence properties of no-regret learning in general N-player games. For concreteness, we focus on the archetypal “follow the regularized leader” (FTRL) family of algorithms, and we consider the full spectrum of uncertainty that the players may encounter – from noisy, oracle-based feedback, to bandit, payoff-based information. In this general context, we establish a comprehensive equivalence between the stability of a Nash equilibrium and its support: a Nash equilibrium is stable and attracting with arbitrarily high probability if and only if it is strict (i.e., each equilibrium strategy has a unique best response). This equivalence extends existing continuous-time versions of the “folk theorem” of evolutionary game theory to a bona fide algorithmic learning setting, and it provides a clear refinement criterion for the prediction of the day-to-day behavior of no-regret learning in games.
A number of recent works (Goldberg, 2006; O’Donnell and Servedio, 2011; De et al., 2017, 2014) have considered the problem of approximately reconstructing an unknown weighted voting scheme given information about various sorts of “power indices” that characterize the level of control that individual voters have over the final outcome. In the language of theoretical computer science, this is the problem of approximating an unknown linear threshold function (LTF) over {−1, 1} n given some numerical measure (such as the function’s n “Chow parameters,” a.k.a. its degree-1 Fourier coefficients, or the vector of its n Shapley indices) of how much each of the n individual input variables affects the outcome of the function. In this paper we consider the problem of reconstructing an LTF given only partial information about its Chow parameters or Shapley indices; i.e. we are given only the Chow parameters or the Shapley indices corresponding to a subset S ⊆ [n] of the n input variables. A natural goal in this partial information setting is to find an LTF whose Chow parameters or Shapley indices corresponding to indices in S accurately match the given Chow parameters or Shapley indices of the unknown LTF. We refer to this as the Partial Inverse Power Index Problem. Our main results are a polynomial time algorithm for the (ε-approximate) Chow Parameters Partial Inverse Power Index Problem and a quasi-polynomial time algorithm for the (ε-approximate) Shapley Indices Partial Inverse Power Index Problem.
This paper considers the following question: how well can depth-two ReLU networks with randomly initialized bottom-level weights represent smooth functions? We give near-matching upper and lower-bounds for L2-approximation in terms of the Lipschitz constant, the desired accuracy, and the dimension of the problem, as well as similar results in terms of Sobolev norms. Our positive results employ tools from harmonic analysis and ridgelet representation theory, while our lower-bounds are based on (robust versions of) dimensionality arguments.
This paper addresses the following natural question: can efficient algorithms weakly learn convex sets under normal distributions? Strong learnability of convex sets under normal distributions is well understood, with near-matching upper and lower bounds given in Klivans et al. (2008), but prior to the current work nothing seems to have been known about weak learning. We essentially answer this question, giving near-matching algorithms and lower bounds. For our positive result, we give a poly(n)-time algorithm that can weakly learn the class of convex sets to advantage Ω(1/ √ n) using only random examples drawn from the background Gaussian distribution. Our algorithm and analysis are based on a new “density increment” result for convex sets, which we prove using tools from isoperimetry. We also give an information-theoretic lower bound showing that O(log(n)/ √ n) advantage is best possible even for algorithms that are allowed to make poly(n) many membership queries.
Four papers from CS researchers were accepted to the 30th USENIX Security Symposium. The yearly conference highlights research on the security and privacy of computer systems and networks.
Hypervisors are widely deployed by cloud computing providers to support virtual machines, but their growing complexity poses a security risk, as large codebases contain many vulnerabilities. We present SeKVM, a layered Linux KVM hypervisor architecture that has been formally verified on multiprocessor hardware. Using layers, we isolate KVM’s trusted computing base into a small core such that only the core needs to be verified to ensure KVM’s security guarantees. Using layers, we model hardware features at different levels of abstraction tailored to each layer of software. Lower hypervisor layers that configure and control hardware are verified using a novel machine model that includes multiprocessor memory management hardware such as multi-level shared page tables, tagged TLBs, and a coherent cache hierarchy with cache bypass support. Higher hypervisor layers that build on the lower layers are then verified using a more abstract and simplified model, taking advantage of layer encapsulation to reduce proof burden. Furthermore, layers provide modularity to reduce verification effort across multiple implementation versions. We have retrofitted and verified multiple versions of KVM on Arm multiprocessor hardware, proving the correctness of the implementations and that they contain no vulnerabilities that can affect KVM’s security guarantees. Our work is the first machine-checked proof for a commodity hypervisor using multiprocessor memory management hardware. SeKVM requires only modest KVM modifications and incurs only modest performance overhead versus unmodified KVM on real application workloads.
Fine Grained Dataflow Tracking with Proximal Gradients Gabriel Ryan Columbia University, Abhishek Shah Columbia University, Dongdong She Columbia University, Koustubha Bhat, Vrije Universiteit Amsterdam; Suman Jana Columbia University
Dataflow tracking with Dynamic Taint Analysis (DTA) is an important method in systems security with many applications, including exploit analysis, guided fuzzing, and side-channel information leak detection. However, DTA is fundamentally limited by the Boolean nature of taint labels, which provide no information about the significance of detected dataflows and lead to false positives/negatives on complex real world programs.
We introduce proximal gradient analysis (PGA), a novel, theoretically grounded approach that can track more accurate and fine-grained dataflow information. PGA uses proximal gradients, a generalization of gradients for non-differentiable functions, to precisely compose gradients over non-differentiable operations in programs. Composing gradients over programs eliminates many of the dataflow propagation errors that occur in DTA and provides richer information about how each measured dataflow effects a program.
We compare our prototype PGA implementation to three state of the art DTA implementations on 7 real-world programs. Our results show that PGA can improve the F1 accuracy of data flow tracking by up to 33% over taint tracking (20% on average) without introducing any significant overhead (< 5% on average). We further demonstrate the effectiveness of PGA by discovering 22 bugs (20 confirmed by developers) and 2 side-channel leaks, and identifying exploitable dataflows in 19 existing CVEs in the tested programs.
AdCube: WebVR Ad Fraud and Practical Confinement of Third-Party Ads Hyunjoo Lee Korea Advanced Institute of Science and Technology, Jiyeon Lee Korea Advanced Institute of Science and Technology, Daejun Kim Korea Advanced Institute of Science and Technology, Suman Jana Columbia University, Insik Shin Korea Advanced Institute of Science and Technology, and Sooel Son Korea Advanced Institute of Science and Technology
Web technology has evolved to offer 360-degree immersive browsing experiences. This new technology, called WebVR, enables virtual reality by rendering a three-dimensional world on an HTML canvas. Unfortunately, there exists no browser-supported way of sharing this canvas between different parties. Assuming an abusive ad service provider who exploits this absence, we present four new ad fraud attack methods. Our user study demonstrates that the success rates of our attacks range from 88.23% to 100%, confirming their effectiveness. To mitigate the presented threats, we propose AdCube, which allows publishers to specify the behaviors of third-party ad code and enforce this specification. We show that AdCube is able to block the presented threats with a small page loading latency of 236 msec and a negligible frame-per-second (FPS) drop for nine WebVR official demo sites.
There are various costs for attackers to manipulate the features of security classifiers. The costs are asymmetric across features and to the directions of changes, which cannot be precisely captured by existing cost models based on Lp-norm robustness. In this paper, we utilize such domain knowledge to increase the attack cost of evading classifiers, specifically, tree ensemble models that are widely used by security tasks. We propose a new cost modeling method to capture the feature manipulation cost as constraint, and then we integrate the cost-driven constraint into the node construction process to train robust tree ensembles. During the training process, we use the constraint to find data points that are likely to be perturbed given the feature manipulation cost, and we use a new robust training algorithm to optimize the quality of the trees. Our cost-aware training method can be applied to different types of tree ensembles, including gradient boosted decision trees and random forest models. Using Twitter spam detection as the case study, our evaluation results show that we can increase the attack cost by 10.6X compared to the baseline. Moreover, our robust training method using cost-driven constraint can achieve higher accuracy, lower false positive rate, and stronger cost-aware robustness than the state-of-the-art training method using L∞-norm cost model. Our code is available at https://github.com/surrealyz/growtrees.
Research from the department has been accepted to the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021).
We introduce a FEVER-like dataset COVID-Fact of 4,086 claims concerning the COVID-19 pandemic. The dataset contains claims, evidence for the claims, and contradictory claims refuted by the evidence. Unlike previous approaches, we automatically detect true claims and their source articles and then generate counter-claims using automatic methods rather than employing human annotators. Along with our constructed resource, we formally present the task of identifying relevant evidence for the claims and verifying whether the evidence refutes or supports a given claim. In addition to scientific claims, our data contains simplified general claims from media sources, making it better suited for detecting general misinformation regarding COVID-19. Our experiments indicate that COVID-Fact will provide a challenging testbed for the development of new systems and our approach will reduce the costs of building domain-specific datasets for detecting misinformation.
Generating metaphors is a difficult task as it requires understanding nuanced relationships between abstract concepts. In this paper, we aim to generate a metaphoric sentence given a literal expression by replacing relevant verbs. Guided by conceptual metaphor theory, we propose to control the generation process by encoding conceptual mappings between cognitive domains to generate meaningful metaphoric expressions. To achieve this, we develop two methods: 1) using FrameNet-based embeddings to learn mappings between domains and applying them at the lexical level (CM-Lex), and 2) deriving source/target pairs to train a controlled seq-to-seq generation model (CM-BART). We assess our methods through automatic and human evaluation for basic metaphoricity and conceptual metaphor presence. We show that the unsupervised CM-Lex model is competitive with recent deep learning metaphor generation systems, and CM-BART outperforms all other models both in automatic and human evaluations.
Social media has become a valuable resource for the study of suicidal ideation and the assessment of suicide risk. Among social media platforms, Reddit has emerged as the most promising one due to its anonymity and its focus on topic-based communities (subreddits) that can be indicative of someone’s state of mind or interest regarding mental health disorders such as r/SuicideWatch, r/Anxiety, r/depression. A challenge for previous work on suicide risk assessment has been the small amount of labeled data. We propose an empirical investigation into several classes of weakly-supervised approaches, and show that using pseudo-labeling based on related issues around mental health (e.g., anxiety, depression) helps improve model performance for suicide risk assessment.
A commonly observed problem with the state-of-the art abstractive summarization models is that the generated summaries can be factually inconsistent with the input documents. The fact that automatic summarization may produce plausible-sounding yet inaccurate summaries is a major concern that limits its wide application. In this paper we present an approach to address factual consistency in summarization. We first propose an efficient automatic evaluation metric to measure factual consistency; next, we propose a novel learning algorithm that maximizes the proposed metric during model training. Through extensive experiments, we confirm that our method is effective in improving factual consistency and even overall quality of the summaries, as judged by both automatic metrics and human evaluation.
To defend against neural system-generated fake news, an effective mechanism is urgently needed. We contribute a novel benchmark for fake news detection at the knowledge element level, as well as a solution for this task which incorporates cross-media consistency checking to detect the fine-grained knowledge elements making news articles misinformative. Due to training data scarcity, we also formulate a novel data synthesis method by manipulating knowledge elements within the knowledge graph to generate noisy training data with specific, hard to detect, known inconsistencies. Our detection approach outperforms the state-of-the-art (up to 16.8% absolute accuracy gain), and more critically, yields fine-grained explanations.
This paper proposes an approach to crosslanguage sentence selection in a low-resource setting. It uses data augmentation and negative sampling techniques on noisy parallel sentence data to directly learn a cross-lingual embedding-based query relevance model. Results show that this approach performs as well as or better than multiple state-of-theart machine translation + monolingual retrieval systems trained on the same parallel data. Moreover, when a rationale training secondary objective is applied to encourage the model to match word alignment hints from a phrase-based statistical machine translation model, consistent improvements are seen across three language pairs (EnglishSomali, English-Swahili and English-Tagalog) over a variety of state-of-the-art baselines.
Inferring social relations from dialogues is vital for building emotionally intelligent robots to interpret human language better and act accordingly. We model the social network as an And-or Graph, named SocAoG, for the consistency of relations among a group and leveraging attributes as inference cues. Moreover, we formulate a sequential structure prediction task, and propose an α–β–γ strategy to incrementally parse SocAoG for the dynamic inference upon any incoming utterance: (i) an α process predicting attributes and relations conditioned on the semantics of dialogues, (ii) a β process updating the social relations based on related attributes, and (iii) a γ process updating individual’s attributes based on interpersonal social relations. Empirical results on DialogRE and MovieGraph show that our model infers social relations more accurately than the state-of-the-art methods. Moreover, the ablation study shows the three processes complement each other, and the case study demonstrates the dynamic relational inference.
Open-domain dialog systems have a usercentric goal: to provide humans with an engaging conversation experience. User engagement is one of the most important metrics for evaluating open-domain dialog systems, and could also be used as real-time feedback to benefit dialog policy learning. Existing work on detecting user disengagement typically requires hand-labeling many dialog samples. We propose HERALD, an efficient annotation framework that reframes the training data annotation process as a denoising problem. Specifically, instead of manually labeling training samples, we first use a set of labeling heuristics to label training samples automatically. We then denoise the weakly labeled data using the Shapley algorithm. Finally, we use the denoised data to train a user engagement detector. Our experiments show that HERALD improves annotation efficiency significantly and achieves 86% user disengagement detection accuracy in two dialog corpora. Our implementation is available at https:// github.com/Weixin-Liang/HERALD/.
Emotional support is a crucial ability for many conversation scenarios, including social interactions, mental health support, and customer service chats. Following reasonable procedures and using various support skills can help to effectively provide support. However, due to the lack of a well-designed task and corpora of effective emotional support conversations, research on building emotional support into dialog systems remains untouched. In this paper, we define the Emotional Support Conversation (ESC) task and propose an ESC Framework, which is grounded on the Helping Skills Theory (Hill, 2009). We construct an Emotion Support Conversation dataset (ESConv) with rich annotation (especially support strategy) in a help-seeker and supporter mode. To ensure a corpus of high-quality conversations that provide examples of effective emotional support, we take extensive effort to design training tutorials for supporters and several mechanisms for quality control during data collection. Finally, we evaluate state-of-the-art dialog models with respect to the ability to provide emotional support. Our results show the importance of support strategies in providing effective emotional support and the utility of ESConv in training more emotional support systems.
Task-oriented dialogue systems typically require manual annotation of dialogue slots in training data, which is costly to obtain. We propose a method that eliminates this requirement: We use weak supervision from existing linguistic annotation models to identify potential slot candidates, then automatically identify domain-relevant slots by using clustering algorithms. Furthermore, we use the resulting slot annotation to train a neural-network-based tagger that is able to perform slot tagging with no human intervention. This tagger is trained solely on the outputs of our method and thus does not rely on any labeled data. Our model demonstrates state-of-the-art performance in slot tagging without labeled training data on four different dialogue domains. Moreover, we find that slot annotations discovered by our model significantly improve the performance of an end-to-end dialogue response generation model, compared to using no slot annotation at all.
Humans are increasingly interacting with machines through language, sometimes in contexts where the user may not know they are talking to a machine (like over the phone or a text chatbot). We aim to understand how system designers and researchers might allow their systems to confirm its non-human identity. We collect over 2,500 phrasings related to the intent of “Are you a robot?”. This is paired with over 2,500 adversarially selected utterances where only confirming the system is non-human would be insufficient or disfluent. We compare classifiers to recognize the intent and discuss the precision/recall and model complexity tradeoffs. Such classifiers could be integrated into dialog systems to avoid undesired deception. We then explore how both a generative research model (Blender) as well as two deployed systems (Amazon Alexa, Google Assistant) handle this intent, finding that systems often fail to confirm their nonhuman identity. Finally, we try to understand what a good response to the intent would be, and conduct a user study to compare the important aspects when responding to this intent.
On the Generation of Medical Dialogs for COVID-19 Meng Zhou, Zechen Li, Bowen Tan, Guangtao Zeng, Wenmian Yang, Xuehai He, Zeqian Ju, Subrato Chakravorty, Shu Chen, Xingyi Yang, Yichen Zhang, Qingyang Wu, Zhou Yu, Kun Xu, Eric Xing, and Pengtao Xie
Under the pandemic of COVID-19, people experiencing COVID19-related symptoms or exposed to risk factors have a pressing need to consult doctors. Due to hospital closure, a lot of consulting services have been moved online. Because of the shortage of medical professionals, many people cannot receive online consultations timely. To address this problem, we aim to develop a medical dialogue system that can provide COVID19-related consultations. We collected two dialogue datasets – CovidDialog – (in English and Chinese respectively) containing conversations between doctors and patients about COVID-19. On these two datasets, we train several dialogue generation models based on Transformer, GPT, and BERT-GPT. Since the two COVID-19 dialogue datasets are small in size, which bear high risk of overfitting, we leverage transfer learning to mitigate data deficiency. Specifically, we take the pretrained models of Transformer, GPT, and BERT-GPT on dialog datasets and other large-scale texts, then finetune them on our CovidDialog datasets. Experiments demonstrate that these approaches are promising in generating meaningful medical dialogues about COVID-19. But more advanced approaches are needed to build a fully useful dialogue system that can offer accurate COVID-related consultations. The data and code are available at https://github.com/UCSD-AI4H/COVID-Dialogue
The recent success of large pre-trained language models such as BERT and GPT-2 has suggested the effectiveness of incorporating language priors in down-stream dialog generation tasks. However, the performance of pre-trained models on dialog task is not as optimal as expected. In this paper, we propose a Pre-trained Role Alternating Language model (PRAL), designed specifically for taskoriented conversational systems. We adopt ARDM (Wu et al., 2019) that models two speakers separately. We also design several techniques, such as start position randomization, knowledge distillation and history discount to improve pre-training performance. We introduce a task-oriented dialog pretraining dataset by cleaning 13 existing data sets. We test PRAL on three different downstream tasks. The results show that PRAL performs better or on par with the state-of-the-art methods.
Associate Professor Xi Chen was awarded the 2021 Delbert Ray Fulkerson Prize for his paper “Complexity of Counting CSP with Complex Weights,” published in Journal of the Association for Computing Machinery in 2017. The award recognizes outstanding papers in discrete mathematics. It is presented at each (triennial) International Symposium of the Mathematical Programming Society.
Research from the department was presented at the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL 2021). The conference is a forum for academic and industry researchers to discuss their work on discourse and dialogue including discourse processing, dialogue systems, corpora, tools, and methodology.
Professor Julia Hirschberg was one of the invited keynote speakers and during her lecture she talked about how computer systems can encourage user trust for recommender systems, knowledge-delivery systems, and dialogue systems.
Most existing methods for automatic fact checking start with a precompiled list of claims to verify. We investigate the understudied problem of determining what statements in news articles are worthy to factcheck. We annotate the argument structure of 95 news articles in the climate change domain that are fact-checked by climate scientists at climatefeedback.org. We release the first multi-layer annotated corpus for both argumentative discourse structure (argument components and relations) and for fact checked statements in news articles. We discuss the connection between argument structure and check-worthy statements and develop several baseline models for detecting checkworthy statements in the climate change domain. Our preliminary results show that using information about argumentative discourse structure shows slight but statistically significant improvement over a baseline of local discourse structure.
While named entity recognition (NER) from speech has been around as long as NER from written text has, the accuracy of NER from speech has generally been much lower than that of NER from text. The rise in popularity of spoken dialog systems such as Siri or Alexa highlights the need for more accurate NER from speech because NER is a core component for understanding what users said in dialogs. Deployed spoken dialog systems receive user input in the form of automatic speech recognition (ASR) transcripts, and simply applying NER model trained on written text to ASR transcripts often leads to low accuracy because compared to written text, ASR transcripts lack important cues such as punctuation and capitalization. Besides, errors in ASR transcripts also make NER from speech challenging. We propose two models that exploit dialog context and speech pattern clues to extract named entities more accurately from open-domain dialogs in spoken dialog systems. Our results show the benefit of modeling dialog context and speech patterns in two settings: a standard setting with random partition of data and a more realistic but also more difficult setting where many named entities encountered during deployment are unseen during training.
Artificial intelligence chatbots are the vanguard in technology-based intervention to change people’s behavior. To develop intervention chatbots, the first step is to understand natural language conversation strategies in human conversation. This work introduces an intervention conversation dataset collected from a real-world physical activity intervention program for women. We designed comprehensive annotation schemes in four dimensions (domain, strategy, social exchange, and taskfocused exchange) and annotated a subset of dialogs. We built a strategy classifier with context information to detect strategies from both trainers and participants based on the annotation. To understand how human intervention induces effective behavior changes, we analyzed the relationships between the intervention strategies and the participants’ changes in the barrier and social support for physical activity. We also analyzed how the participant’s baseline weight correlates to the amount of occurrence of the corresponding strategy. This work lays the foundation for developing a personalized physical activity intervention bot.
Real-world conversational agents must effectively handle long conversations that span multiple contexts. Such context can be interspersed with chitchat (dialog turns not directly related to the task at hand), and potentially grounded in a multimodal setting. While prior work focused on the above aspects in isolation, there is a lack of a unified framework that studies them together. To overcome this, we propose DialogStitch, a novel framework to seamlessly ‘stitch’ multiple conversations and highlight these desirable traits in a task-oriented dialog. After stitching, our dialogs are provably deeper, contain longer-term dependencies, and span multiple contexts, when compared with the source dialogs— all by leveraging existing human annotations! Though our framework generalizes to a variety of combinations, we demonstrate its benefits in two settings: (a) multimodal, image-grounded conversations, and, (b) task-oriented dialogs fused with chit-chat conversations. We benchmark state-of-the-art dialog models on our datasets and find accuracy drops of (a) 12% and (b) 45% respectively, indicating the additional challenges in the stitched dialogs. Our code and data are publicly available.
Annotation Inconsistency and Entity Bias in MultiWOZ Kun Qian Columbia University, Ahmad Beirami Facebook AI, Zhouhan Lin Facebook AI, Ankita De Facebook AI, Alborz Geramifard Facebook AI, Zhou Yu Columbia University, and Chinnadhurai Sankar Facebook AI
MultiWOZ (Budzianowski et al., 2018) is one of the most popular multi-domain taskoriented dialog datasets, containing 10K+ annotated dialogs covering eight domains. It has been widely accepted as a benchmark for various dialog tasks, e.g., dialog state tracking (DST), natural language generation (NLG) and end-to-end (E2E) dialog modeling. In this work, we identify an overlooked issue with dialog state annotation inconsistencies in the dataset, where a slot type is tagged inconsistently across similar dialogs leading to confusion for DST modeling. We propose an automated correction for this issue, which is present in 70% of the dialogs. Additionally, we notice that there is significant entity bias in the dataset (e.g., “cambridge” appears in 50% of the destination cities in the train domain). The entity bias can potentially lead to named entity memorization in generative models, which may go unnoticed as the test set suffers from a similar entity bias as well. We release a new test set with all entities replaced with unseen entities. Finally, we benchmark joint goal accuracy (JGA) of the state-of-the art DST baselines on these modified versions of the data. Our experiments show that the annotation inconsistency corrections lead to 7- 10% improvement in JGA. On the other hand, we observe a 29% drop in JGA when models are evaluated on the new test set with unseen entities.
Papers from CS researchers have been accepted to the 38th International Conference on Machine Learning (ICML 2021).
Associate Professor Daniel Hsu was one of the publication chairs of the conference and Assistant Professor Elham Azizi helped organize the 2021 ICML Workshop on Computational Biology. The workshop highlighted how machine learning approaches can be tailored to making both translational and basic scientific discoveries with biological data.
Below are the abstracts and links to the accepted papers.
Causal inference from observational data can be biased by unobserved confounders. Confounders—the variables that affect both the treatments and the outcome—induce spurious non-causal correlations between the two. Without additional conditions, unobserved confounders generally make causal quantities hard to identify. In this paper, we focus on the setting where there are many treatments with shared confounding, and we study under what conditions is causal identification possible. The key observation is that we can view subsets of treatments as proxies of the unobserved confounder and identify the intervention distributions of the rest. Moreover, while existing identification formulas for proxy variables involve solving integral equations, we show that one can circumvent the need for such solutions by directly modeling the data. Finally, we extend these results to an expanded class of causal graphs, those with other confounders and selection variables.
We present neural activation coding (NAC) as a novel approach for learning deep representations from unlabeled data for downstream applications. We argue that the deep encoder should maximize its nonlinear expressivity on the data for downstream predictors to take full advantage of its representation power. To this end, NAC maximizes the mutual information between activation patterns of the encoder and the data over a noisy communication channel. We show that learning for a noise-robust activation code increases the number of distinct linear regions of ReLU encoders, hence the maximum nonlinear expressivity. More interestingly, NAC learns both continuous and discrete representations of data, which we respectively evaluate on two downstream tasks: (i) linear classification on CIFAR-10 and ImageNet-1K and (ii) nearest neighbor retrieval on CIFAR-10 and FLICKR-25K. Empirical results show that NAC attains better or comparable performance on both tasks over recent baselines including SimCLR and DistillHash. In addition, NAC pretraining provides significant benefits to the training of deep generative models. Our code is available at https://github.com/yookoon/nac.
The Logical Options Framework Brandon Araki MIT, Xiao Li MIT, Kiran Vodrahalli Columbia University, Jonathan DeCastro Toyota Research Institute, Micah Fry MIT Lincoln Laboratory, Daniela Rus MIT CSAIL
Learning composable policies for environments with complex rules and tasks is a challenging problem. We introduce a hierarchical reinforcement learning framework called the Logical Options Framework (LOF) that learns policies that are satisfying, optimal, and composable. LOF efficiently learns policies that satisfy tasks by representing the task as an automaton and integrating it into learning and planning. We provide and prove conditions under which LOF will learn satisfying, optimal policies. And lastly, we show how LOF’s learned policies can be composed to satisfy unseen tasks with only 10-50 retraining steps on our benchmarks. We evaluate LOF on four tasks in discrete and continuous domains, including a 3D pick-and-place environment.
General methods have been developed for estimating causal effects from observational data under causal assumptions encoded in the form of a causal graph. Most of this literature assumes that the underlying causal graph is completely specified. However, only observational data is available in most practical settings, which means that one can learn at most a Markov equivalence class (MEC) of the underlying causal graph. In this paper, we study the problem of causal estimation from a MEC represented by a partial ancestral graph (PAG), which is learnable from observational data. We develop a general estimator for any identifiable causal effects in a PAG. The result fills a gap for an end-to-end solution to causal inference from observational data to effects estimation. Specifically, we develop a complete identification algorithm that derives an influence function for any identifiable causal effects from PAGs. We then construct a double/debiased machine learning (DML) estimator that is robust to model misspecification and biases in nuisance function estimation, permitting the use of modern machine learning techniques. Simulation results corroborate with the theory.
Learning models that gracefully handle distribution shifts is central to research on domain generalization, robust optimization, and fairness. A promising formulation is domain-invariant learning, which identifies the key issue of learning which features are domain-specific versus domain-invariant. An important assumption in this area is that the training examples are partitioned into domains'' orenvironments”. Our focus is on the more common setting where such partitions are not provided. We propose EIIL, a general framework for domain-invariant learning that incorporates Environment Inference to directly infer partitions that are maximally informative for downstream Invariant Learning. We show that EIIL outperforms invariant learning methods on the CMNIST benchmark without using environment labels, and significantly outperforms ERM on worst-group performance in the Waterbirds dataset. Finally, we establish connections between EIIL and algorithmic fairness, which enables EIIL to improve accuracy and calibration in a fair prediction problem.
Sketch drawings capture the salient information of visual concepts. Previous work has shown that neural networks are capable of producing sketches of natural objects drawn from a small number of classes. While earlier approaches focus on generation quality or retrieval, we explore properties of image representations learned by training a model to produce sketches of images. We show that this generative, class-agnostic model produces informative embeddings of images from novel examples, classes, and even novel datasets in a few-shot setting. Additionally, we find that these learned representations exhibit interesting structure and compositionality.
Few-shot dataset generalization is a challenging variant of the well-studied few-shot classification problem where a diverse training set of several datasets is given, for the purpose of training an adaptable model that can then learn classes from \emph{new datasets} using only a few examples. To this end, we propose to utilize the diverse training set to construct a \emph{universal template}: a partial model that can define a wide array of dataset-specialized models, by plugging in appropriate components. For each new few-shot classification problem, our approach therefore only requires inferring a small number of parameters to insert into the universal template. We design a separate network that produces an initialization of those parameters for each given task, and we then fine-tune its proposed initialization via a few steps of gradient descent. Our approach is more parameter-efficient, scalable and adaptable compared to previous methods, and achieves the state-of-the-art on the challenging Meta-Dataset benchmark.
On Monotonic Linear Interpolation of Neural Network Parameters James Lucas University of Toronto, Juhan Bae University of Toronto, Michael Zhang University of Toronto, Stanislav Fort Google AI, Richard Zemel Columbia University, Roger Grosse University of Toronto
Linear interpolation between initial neural network parameters and converged parameters after training with stochastic gradient descent (SGD) typically leads to a monotonic decrease in the training objective. This Monotonic Linear Interpolation (MLI) property, first observed by Goodfellow et al. 2014, persists in spite of the non-convex objectives and highly non-linear training dynamics of neural networks. Extending this work, we evaluate several hypotheses for this property that, to our knowledge, have not yet been explored. Using tools from differential geometry, we draw connections between the interpolated paths in function space and the monotonicity of the network — providing sufficient conditions for the MLI property under mean squared error. While the MLI property holds under various settings (e.g., network architectures and learning problems), we show in practice that networks violating the MLI property can be produced systematically, by encouraging the weights to move far from initialization. The MLI property raises important questions about the loss landscape geometry of neural networks and highlights the need to further study their global properties.
Slang is a common type of informal language, but its flexible nature and paucity of data resources present challenges for existing natural language systems. We take an initial step toward machine generation of slang by developing a framework that models the speaker’s word choice in slang context. Our framework encodes novel slang meaning by relating the conventional and slang senses of a word while incorporating syntactic and contextual knowledge in slang usage. We construct the framework using a combination of probabilistic inference and neural contrastive learning. We perform rigorous evaluations on three slang dictionaries and show that our approach not only outperforms state-of-the-art language models, but also better predicts the historical emergence of slang word usages from 1960s to 2000s. We interpret the proposed models and find that the contrastively learned semantic space is sensitive to the similarities between slang and conventional senses of words. Our work creates opportunities for the automated generation and interpretation of informal language.
We aim to bridge the gap between typical human and machine-learning environments by extending the standard framework of few-shot learning to an online, continual setting. In this setting, episodes do not have separate training and testing phases, and instead models are evaluated online while learning novel classes. As in the real world, where the presence of spatiotemporal context helps us retrieve learned skills in the past, our online few-shot learning setting also features an underlying context that changes throughout time. Object classes are correlated within a context and inferring the correct context can lead to better performance. Building upon this setting, we propose a new few-shot learning dataset based on large scale indoor imagery that mimics the visual experience of an agent wandering within a world. Furthermore, we convert popular few-shot learning approaches into online versions and we also propose a new contextual prototypical memory model that can make use of spatiotemporal contextual information from the recent past.
Few-shot classification (FSC), the task of adapting a classifier to unseen classes given a small labeled dataset, is an important step on the path toward human-like machine learning. Bayesian methods are well-suited to tackling the fundamental issue of overfitting in the few-shot scenario because they allow practitioners to specify prior beliefs and update those beliefs in light of observed data. Contemporary approaches to Bayesian few-shot classification maintain a posterior distribution over model parameters, which is slow and requires storage that scales with model size. Instead, we propose a Gaussian process classifier based on a novel combination of Pólya-Gamma augmentation and the one-vs-each softmax approximation that allows us to efficiently marginalize over functions rather than model parameters. We demonstrate improved accuracy and uncertainty quantification on both standard few-shot classification benchmarks and few-shot domain transfer tasks.
Theoretical Bounds On Estimation Error For Meta-Learning James Lucas University of Toronto, Mengye Ren University of Toronto, Irene Kameni African Master for Mathematical Sciences, Toni Pitassi Columbia University, Richard Zemel Columbia University
Machine learning models have traditionally been developed under the assumption that the training and test distributions match exactly. However, recent success in few-shot learning and related problems are encouraging signs that these models can be adapted to more realistic settings where train and test distributions differ. Unfortunately, there is severely limited theoretical support for these algorithms and little is known about the difficulty of these problems. In this work, we provide novel information-theoretic lower-bounds on minimax rates of convergence for algorithms that are trained on data from multiple sources and tested on novel data. Our bounds depend intuitively on the information shared between sources of data, and characterize the difficulty of learning in this setting for arbitrary algorithms. We demonstrate these bounds on a hierarchical Bayesian model of meta-learning, computing both upper and lower bounds on parameter estimation via maximum-a-posteriori inference.
In this paper, we derive generalization bounds for the two primary classes of graph neural networks (GNNs), namely graph convolutional networks (GCNs) and message passing GNNs (MPGNNs), via a PAC-Bayesian approach. Our result reveals that the maximum node degree and spectral norm of the weights govern the generalization bounds of both models. We also show that our bound for GCNs is a natural generalization of the results developed in arXiv:1707.09564v2 [cs.LG] for fully-connected and convolutional neural networks. For message passing GNNs, our PAC-Bayes bound improves over the Rademacher complexity based bound in arXiv:2002.06157v1 [cs.LG], showing a tighter dependency on the maximum node degree and the maximum hidden dimension. The key ingredients of our proofs are a perturbation analysis of GNNs and the generalization of PAC-Bayes analysis to non-homogeneous GNNs. We perform an empirical study on several real-world graph datasets and verify that our PAC-Bayes bound is tighter than others.
Distributed systems are notoriously hard to implement correctly due to non-determinism. Finding the inductive invariant of the distributed protocol is a critical step in verifying the correctness of distributed systems, but takes a long time to do even for simple protocols. We present DistAI, a data-driven automated system for learning inductive invariants for distributed protocols. DistAI generates data by simulating the distributed protocol at different instance sizes and recording states as samples. Based on the observation that invariants are often concise in practice, DistAI starts with small invariant formulas and enumerates all strongest possible invariants that hold for all samples. It then feeds those invariants and the desired safety properties to an SMT solver to check if the conjunction of the invariants and the safety properties is inductive. Starting with small invariant formulas and strongest possible invariants avoids large SMT queries, improving SMT solver performance. Because DistAI starts with the strongest possible invariants, if the SMT solver fails, DistAI does not need to discard failed invariants, but knows to monotonically weaken them and try again with the solver, repeating the process until it eventually succeeds. We prove that DistAI is guaranteed to find the ∃-free inductive invariant that proves the desired safety properties in finite time, if one exists. Our evaluation shows that DistAI successfully verifies 13 common distributed protocols automatically and outperforms alternative methods both in the number of protocols it verifies and the speed at which it does so, in some cases by more than two orders of magnitude.
Modern desktop applications involve many asynchronous, concurrent interactions that make performance issues difficult to diagnose. Although prior work has used causal tracing for debugging performance issues in distributed systems, we find that these techniques suffer from high inaccuracies for desktop applications. We present Argus, a fast, effective causal tracing tool for debugging performance anomalies in desktop applications. Argus introduces a novel notion of strong and weak edges to explicitly model and annotate trace graph ambiguities, a new beam-search-based diagnosis algorithm to select the most likely causal paths in the presence of ambiguities, and a new way to compare causal paths across normal and abnormal executions. We have implemented Argus across multiple versions of macOS and evaluated it on 12 infamous spinning pinwheel issues in popular macOS applications. Argus diagnosed the root causes for all issues, 10 of which were previously unknown, some of which have been open for several years. Argus incurs less than 5% CPU overhead when its system-wide tracing is enabled, making always-on tracing feasible.
Assistant Professor Carl Vondrick, Didac Souris, and Ruoshi Liu developed a computer vision algorithm for predicting human interactions and body language in video, a capability that could have applications for assistive technology, autonomous vehicles, and collaborative robots.
Jun Rao (PhD’00) co-founded Confluent, a platform that makes it easy to connect apps, systems, and an entire organization with real-time data flow and processing. The company develops database technologies, launched its IPO recently.
Associate Professor Simha Sethumadhavan, Mohamed Tarek, and Miguel Arroyo design new techniques to bolster memory safety; ideas are now being used by Air Force Research Lab.
Research from the department has been accepted to the 2021 Computer Vision and Pattern Recognition (CVPR) Conference. The annual event explores machine learning, artificial intelligence, and computer vision research and its applications.
Abstract Despite the remarkable accuracy of deep neural networks in object detection, they are costly to train and scale due to supervision requirements. Particularly, learning more object categories typically requires proportionally more bounding box annotations. Weakly supervised and zero-shot learning techniques have been explored to scale object detectors to more categories with less supervision, but they have not been as successful and widely adopted as supervised models. In this paper, we put forth a novel formulation of the object detection problem, namely open-vocabulary object detection, which is more general, more practical, and more effective than weakly supervised and zero-shot approaches. We propose a new method to train object detectors using bounding box annotations for a limited set of object categories, as well as image-caption pairs that cover a larger variety of objects at a significantly lower cost. We show that the proposed method can detect and localize objects for which no bounding box annotation is provided during training, at a significantly higher accuracy than zero-shot approaches. Meanwhile, objects with bounding box annotation can be detected almost as accurately as supervised methods, which is significantly better than weakly supervised baselines. Accordingly, we establish a new state-of-the-art for scalable object detection.
Abstract We present Vx2Text, a framework for text generation from multimodal inputs consisting of video plus text, speech, or audio. In order to leverage transformer networks, which have been shown to be effective at modeling language, each modality is first converted into a set of language embeddings by a learnable tokenizer. This allows our approach to perform multimodal fusion in the language space, thus eliminating the need for ad-hoc cross-modal fusion modules. To address the non-differentiability of tokenization on continuous inputs (e.g., video or audio), we utilize a relaxation scheme that enables end-to-end training. Furthermore, unlike prior encoder-only models, our network includes an autoregressive decoder to generate open-ended text from the multimodal embeddings fused by the language encoder. This renders our approach fully generative and makes it directly applicable to different “video+x to text” problems without the need to design specialized network heads for each task. The proposed framework is not only conceptually simple but also remarkably effective: experiments demonstrate that our approach based on a single architecture outperforms the state-of-the-art on three video-based text-generation tasks—captioning, question answering, and audio-visual scene-aware dialog. Our code will be made publicly available.
Abstract In this paper, we address the problem of referring expression comprehension in videos, which is challenging due to complex expression and scene dynamics. Unlike previous methods which solve the problem in multiple stages (i.e., tracking, proposal-based matching), we tackle the problem from a novel perspective, co-grounding, with an elegant one-stage framework. We enhance the single-frame grounding accuracy by semantic attention learning and improve the cross-frame grounding consistency with co-grounding feature learning. Semantic attention learning explicitly parses referring cues in different attributes to reduce the ambiguity in the complex expression. Co-grounding feature learning boosts visual feature representations by integrating temporal correlation to reduce the ambiguity caused by scene dynamics. Experiment results demonstrate the superiority of our framework on the video grounding datasets VID and OTB in generating accurate and stable results across frames. Our model is also applicable to referring expression comprehension in images, illustrated by the improved performance on the RefCOCO dataset. Our project is available at https://sijiesong.github.io/co-grounding.
Abstract We propose a new flash technique for low-light imaging, using deep-red light as an illuminating source. Our main observation is that in a dim environment, the human eye mainly uses rods for the perception of light, which are not sensitive to wavelengths longer than 620nm, yet the camera sensor still has a spectral response. We propose a novel modulation strategy when training a modern CNN model for guided image filtering, fusing a noisy RGB frame and a flash frame. This fusion network is further extended for video reconstruction. We have built a prototype with minor hardware adjustments and tested the new flash technique on a variety of static and dynamic scenes. The experimental results demonstrate that our method produces compelling reconstructions, even in extra dim conditions.
UC2: Universal Cross-Lingual Cross-Modal Vision-and-Language Pre-Training Mingyang Zhou University of California, Davis, Luowei Zhou Microsoft Dynamics 365 AI Research, Shuohang Wang Microsoft Dynamics 365 AI Research, Yu Cheng Microsoft Dynamics 365 AI Research, Linjie Li Microsoft Dynamics 365 AI Research, Zhou Yu University of California, Davis and Columbia University, Jingjing Liu Microsoft Dynamics 365 AI Research
Abstract Vision-and-language pre-training has achieved impressive success in learning multimodal representations between vision and language. To generalize this success to non-English languages, we introduce UC^2, the first machine translation-augmented framework for cross-lingual cross-modal representation learning. To tackle the scarcity problem of multilingual captions for image datasets, we first augment existing English-only datasets with other languages via machine translation (MT). Then we extend the standard Masked Language Modeling and Image-Text Matching training objectives to multilingual setting, where alignment between different languages is captured through shared visual context (eg. using image as pivot). To facilitate the learning of a joint embedding space of images and all languages of interest, we further propose two novel pre-training tasks, namely Maksed Region-to-Token Modeling (MRTM) and Visual Translation Language Modeling (VTLM), leveraging MT-enhanced translated data. Evaluation on multilingual image-text retrieval and multilingual visual question answering benchmarks demonstrates that our proposed framework achieves new state of the art on diverse non-English benchmarks while maintaining comparable performance to monolingual pre-trained models on English tasks.
Abstract We introduce a framework that predicts the goals behind observable human action in video. Motivated by evidence in developmental psychology, we leverage video of unintentional action to learn video representations of goals without direct supervision. Our approach models videos as contextual trajectories that represent both low-level motion and high-level action features. Experiments and visualizations show our trained model is able to predict the underlying goals in video of unintentional action. We also propose a method to “automatically correct” unintentional action by leveraging gradient signals of our model to adjust latent trajectories. Although the model is trained with minimal supervision, it is competitive with or outperforms baselines trained on large (supervised) datasets of successfully executed goals, showing that observing unintentional action is crucial to learning about goals in video.
Generative Interventions for Causal Learning Chengzhi Mao Columbia University, Augustine Cha Columbia University, Amogh Gupta Columbia University, Hao Wang Rutgers University, Junfeng Yang Columbia University, Carl Vondrick Columbia University
Abstract We introduce a framework for learning robust visual representations that generalize to new viewpoints, backgrounds, and scene contexts. Discriminative models often learn naturally occurring spurious correlations, which cause them to fail on images outside of the training distribution. In this paper, we show that we can steer generative models to manufacture interventions on features caused by confounding factors. Experiments, visualizations, and theoretical results show this method learns robust representations more consistent with the underlying causal relationships. Our approach improves performance on multiple datasets demanding out-of-distribution generalization, and we demonstrate state-of-the-art performance generalizing from ImageNet to ObjectNet dataset.
Abstract We introduce a framework for learning from unlabeled video what is predictable in the future. Instead of committing up front to features to predict, our approach learns from data which features are predictable. Based on the observation that hyperbolic geometry naturally and compactly encodes hierarchical structure, we propose a predictive model in hyperbolic space. When the model is most confident, it will predict at a concrete level of the hierarchy, but when the model is not confident, it learns to automatically select a higher level of abstraction. Experiments on two established datasets show the key role of hierarchical representations for action prediction. Although our representation is trained with unlabeled video, visualizations show that action hierarchies emerge in the representation.
Abstract Generative Adversarial Networks (GANs) are able to generate high-quality images, but it remains difficult to explicitly specify the semantics of synthesized images. In this work, we aim to better understand the semantic representation of GANs, and thereby enable semantic control in GAN’s generation process. Interestingly, we find that a well-trained GAN encodes image semantics in its internal feature maps in a surprisingly simple way: a linear transformation of feature maps suffices to extract the generated image semantics. To verify this simplicity, we conduct extensive experiments on various GANs and datasets; and thanks to this simplicity, we are able to learn a semantic segmentation model for a trained GAN from a small number (e.g., 8) of labeled images. Last but not least, leveraging our finding, we propose two few-shot image editing approaches, namely Semantic-Conditional Sampling and Semantic Image Editing. Given a trained GAN and as few as eight semantic annotations, the user is able to generate diverse images subject to a user-provided semantic layout, and control the synthesized image semantics. We have made the code publicly available.
Research papers from the Natural Language Processing and Speech groups have been accepted to the 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2021).
Abstract Stance detection on social media can help to identify and understand slanted news or commentary in everyday life. In this work, we propose a new model for zero-shot stance detection on Twitter that uses adversarial learning to generalize across topics. Our model achieves state-of-the-art performance on a number of unseen test topics with minimal computational costs. In addition, we extend zero-shot stance detection to new topics, highlighting future directions for zero-shot transfer.
Supporting Clustering with Contrastive Learning Dejiao Zhang AWS AI, Feng Nan AWS AI, Xiaokai Wei AWS AI, Shang-Wen Li AWS AI, Henghui Zhu AWS AI, Kathleen McKeown Columbia University, Ramesh Nallapati AWS AI, Andrew O. Arnold AWS AI, and Bing Xiang AWS AI
Abstract Unsupervised clustering aims at discovering the semantic categories of data according to some distance measured in the representation space. However, different categories often overlap with each other in the representation space at the beginning of the learning process, which poses a significant challenge for distance-based clustering in achieving good separation between different categories. To this end, we propose Supporting Clustering with Contrastive Learning (SCCL) – a novel framework to leverage contrastive learning to promote better separation. We assess the performance of SCCL on short text clustering and show that SCCL significantly advances the state-of-the-art results on most benchmark datasets with 3%−11% improvement on Accuracy and 4% − 15% improvement on Normalized Mutual Information. Furthermore, our quantitative analysis demonstrates the effectiveness of SCCL in leveraging the strengths of both bottom-up instance discrimination and top-down clustering to achieve better intracluster and inter-cluster distances when evaluated with the ground truth cluster labels.
Abstract The problem of detecting psychological stress in online posts, and more broadly, of detecting people in distress or in need of help, is a sensitive application for which the ability to interpret models is vital. Here, we present work exploring the use of a semantically related task, emotion detection, for equally competent but more explainable and human-like psychological stress detection as compared to a black-box model. In particular, we explore the use of multi-task learning as well as emotion-based language model fine-tuning. With our emotion-infused models, we see comparable results to state-of-the-art BERT. Our analysis of the words used for prediction shows that our emotion-infused models mirror the psychological components of stress.
Abstract Framing involves the positive or negative presentation of an argument or issue depending on the audience and goal of the speaker (Entman, 1983). Differences in lexical framing, the focus of our work, can have large effects on peoples’ opinions and beliefs. To make progress towards reframing arguments for positive effects, we create a dataset and method for this task. We use a lexical resource for connotations to create a parallel corpus and propose a method for argument reframing that combines controllable text generation (positive connotation) with a post-decoding entailment component (same denotation). Our results show that our method is effective compared to strong baselines along the dimensions of fluency, meaning, and trustworthiness/reduction of fear
Abstract Generating metaphors is a challenging task as it requires a proper understanding of abstract concepts, making connections between unrelated concepts, and deviating from the literal meaning. In this paper, we aim to generate a metaphoric sentence given a literal expression by replacing relevant verbs. Based on a theoretically-grounded connection between metaphors and symbols, we propose a method to automatically construct a parallel corpus by transforming a large number of metaphorical sentences from the Gutenberg Poetry corpus (Jacobs, 2018) to their literal counterpart using recent advances in masked language modeling coupled with commonsense inference. For the generation task, we incorporate a metaphor discriminator to guide the decoding of a sequence to sequence model finetuned on our parallel data to generate high-quality metaphors. Human evaluation on an independent test set of literal statements shows that our best model generates metaphors better than three well-crafted baselines 66% of the time on average. Moreover, a task-based evaluation shows that human-written poems enhanced with metaphors proposed by our model are preferred 68% of the time compared to poems without metaphors.
Leveraging Slot Descriptions for Zero-Shot Cross-Domain Dialogue StateTracking Zhaojiang Lin The Hong Kong University of Science and Technology, Bing Liu Facebook, Seungwhan Moon Facebook, Paul Crook Facebook, Zhenpeng Zhou Facebook, Zhiguang Wang Facebook, Zhou Yu Columbia University, Andrea Madotto The Hong Kong University of Science and Technology, Eunjoon Cho Facebook, and Rajen Subba Facebook
Abstract Zero-shot cross-domain dialogue state tracking (DST) enables us to handle task-oriented dialogue in unseen domains without the expense of collecting in-domain data. In this paper, we propose a slot description enhanced generative approach for zero-shot cross-domain DST. Specifically, our model first encodes dialogue context and slots with a pre-trained self-attentive encoder and generates slot values in an auto-regressive manner. In addition, we incorporate Slot Type Informed Descriptions that capture the shared information across slots to facilitate cross-domain knowledge transfer. Experimental results on the MultiWOZ dataset show that our proposed method significantly improves existing state-of-the-art results in the zero-shot cross-domain setting.
Abstract Existing goal-oriented dialogue datasets focus mainly on identifying slots and values. However, customer support interactions in reality often involve agents following multi-step procedures derived from explicitly defined company policies as well. To study customer service dialogue systems in more realistic settings, we introduce the Action-Based Conversations Dataset (ABCD), a fully-labeled dataset with over 10K human-to-human dialogues containing 55 distinct user intents requiring unique sequences of actions constrained by policies to achieve task success. We propose two additional dialog tasks, Action State Tracking and Cascading Dialogue Success, and establish a series of baselines involving large-scale, pre-trained language models on this dataset. Empirical results demonstrate that while more sophisticated networks outperform simpler models, a considerable gap (50.8% absolute accuracy) still exists to reach human-level performance on ABCD.
Self-Training with Weak Supervision Giannis Karamanolakis Columbia University, Subhabrata Mukherjee Microsoft Research, Guoqing Zheng Microsoft Research, Ahmed Hassan Awadallah Microsoft Research
Abstract State-of-the-art deep neural networks require large-scale labeled training data that is often expensive to obtain or not available for many tasks. Weak supervision in the form of domain-specific rules has been shown to be useful in such settings to automatically generate weakly labeled training data. However, learning with weak rules is challenging due to their inherent heuristic and noisy nature. An additional challenge is rule coverage and overlap, where prior work on weak supervision only considers instances that are covered by weak rules, thus leaving valuable unlabeled data behind. In this work, we develop a weak supervision framework (ASTRA1 ) that leverages all the available data for a given task. To this end, we leverage task-specific unlabeled data through self-training with a model (student) that considers contextualized representations and predicts pseudo-labels for instances that may not be covered by weak rules. We further develop a rule attention network (teacher) that learns how to aggregate student pseudo-labels with weak rule labels, conditioned on their fidelity and the underlying context of an instance. Finally, we construct a semi-supervised learning objective for end-to-end training with unlabeled data, domain-specific rules, and a small amount of labeled data. Extensive experiments on six benchmark datasets for text classification demonstrate the effectiveness of our approach with significant improvements over state-of-the-art baselines.
Assistant Professor Shuran Song has won a 2021 Microsoft Research Faculty Fellowship. The fellowship recognizes innovative, promising new faculty whose exceptional talent for innovation identifies them as emerging leaders in their fields.
Google Research held an online workshop on the conceptual understanding of deep learning. The workshop discussed how new findings in deep learning and neuroscience can help create better artificial intelligence systems. Christos Papadimitriou discussed how our growing understanding of information-processing mechanisms in the brain might help create algorithms that are more robust in understanding and engaging in conversations. Papadimitriou presented a simple and efficient model that explains how different areas of the brain inter-communicate to solve cognitive problems.
Professors Jason Nieh and Ronghui Gu build the first system to guarantee the security of virtual machines in the cloud, SeKVM could transform how cloud services are designed, developed, deployed, and trusted.
The Gödel Prize for outstanding papers in the area of theoretical computer science is sponsored jointly by the EATCS and the ACM SIGACT. Chen is recognized for his 2017 paper on constraint satisfaction problems (CSP).
The department is extremely proud of all of our students and would like to honor this year’s graduates! We look forward to when we can all come together and celebrate in person.
First, a tribute from our faculty!
During Class Day, awards were given to students who excelled in academics, students with independent research projects, and to those with outstanding performance in teaching and services to the department. The list of awardees is in this year’s graduation handout.
Professor Augustin Chaintreau received the Janette and Armen Avanessians Diversity Award for outstanding research and leadership in advancing diversity in departmental, school, and university programs at Columbia.
At this year’s commencement, more than 600 students received a computer science degree. Click on the logos to see the CS graduates from each college.
Student spotlight
More memories from the past four years…
Columbia and Barnard students attend the 2019 Grace Hopper Conference in Orlando.
Lydia Chilton is the best professor at Columbia hands down. No one brings energy to the CS department the way that she does. GOAT. - Dillon Hayes
Ciaran Beckford and The Ngo at the PennApps 2017 Hackathon.
Charles Baird coding in the kitchen.
AI Generated Artwork, JR Carneiro CS SEAS '21 and Caroline Lin CS CC '21
Bryan Lei and John Wang at the AP Hackathon in Spring 2018
Audrey Cheng with fellow SWE interns at a Salesforce intern event in Denver, Colorado.
Cryptography with Tal! This class was so fascinating. - Ari Hirsch
Arkadiy Saakyan with friends at the tree lighting.
Anushri Arora took this photo of Olha Maslova while they were working on an Analysis of Algorithms problem set at the DSI lounge in late February 2020 - just before the "Stay at Home" story began.
Anastasia Dmitrienko and fellow TA Sophia Kolak giving a midterm recitation for CS Theory.
Amir Idris with the Columbia Data Science Society 2018
Alexander Peile and Tracy Chen declaring majors
Alexander Cohen on the right with other (non-Columbia) students in San Francisco during an internship.
Alex Horimbere on the day when he arrived at Columbia.
Left to right: Cesar Ramos Medina, Iliana Cantu, Ecenaz Ozmen, Yefri Gaitan, Daniel Garces Botero at Google Games. Where they won the most energetic table. Go red team!
Daniel Hanoch (in black jacket) and co-workers at the Earth Institute.
Yarden Carmeli at the Low (High) Beach
The Operating Systems group Maya, Ecenaz, and Ahmed grinding hard for OS projects.
Tanvi Hisaria, Manav, and Aeshna making the best of the Zoom year!
Shout out to DSA with Professor Bauer because he is almost always online, super approachable, and really makes an effort to help me understand CS concepts in general. - Andrew Molina
Sarah Radway and Sophia Kolak at Hack Harvard...trying to win "most useless hack" and still losing.
Sharon Jian with former and current TAs for UI Design grading together
Payal Chandak exploring the tunnels on campus
Rediet Bekele sneaking a Post Lab 7 nap in AP class (Sorry, Jae 🙁 )
Romy Zilkha studying at Butler Library
Rupal Gupta at Googleplex, where she interned her junior summer, and will be joining full-time.
Sara Bernstein and Rebecca Narin (BC '21), who became good friends after taking COMS 1004 together, as they finished their last class of Freshmen year!
Owen Bishop, Dominic Dyer, Jackson Storey, and Jon Lauer at the park.
Omer Fahri Onder kicking an opponent at the MIT taekwondo competition
Huge thanks to Professor Blaer for an engaging, fun, and interesting course in data structures. The knowledge I gained in this course has been the foundation for my coding abilities and a sturdy rock in my journey to becoming a data science graduate. - Aaron Morrill (with his sister, Haley)
Visiting Giphy as a part of Entrepreneurship Via Exploration with Anne Xie, Jenny Li, Monika Francsics, Sungbin Kim, Jeff Huang, Jordan Ramos, and Nico Molina.
Michael Karasev, Chris Mendell, Rodolfo Raimundo, Nathelie Hager, and Sabrina Selleck.
After flying to El Paso during finals week and pulling 3 all-nighters, they were finally able to watch their scientific payload launch on board of Blue Origin's rocket.
Harrison Qu in matrix code.
Left to Right: Kelsey Namba, Serena Tam, David Ji at the Princeton Hackathon.
Lucas Hahn
Manasi Sharma at the GROWTH 2019 Conference at San Diego State University during a coding challenge.
Mariya Delyakova and COMS 1004 classmates a week before COVID hit.
Marwan Salam in Havemeyer 309 - Concluding AP Lecture
Jonathan Sanabria (center in blue) with other members of the Columbia Robotics Club working on a prototype for an underwater ROV.
Discrete Math got me like...
Jinho Lee on a site tour of Google.
Jin Woo Won and Junyao Shi at the AP hackathon, hacking away as Jae played some music.
Jeffrey Kline - remembering long days in the library.
Malware Analysis is a great class - I love reverse engineering and learning how programs and operating systems work, so this class was particularly fun for me. - Christopher Smith (with his very sleepy pup!)
Samuel Fu's internship memories in 2019
JADE cohort January 2018 at Facebook
Jason Herrera the the Brooklyn Bridge in 2019.
Giovanni Sanchez hanging out with friends at Faculty House.
Gael Zendejas, Vikram Ho, Xiao Lim, and Adam Lin at the AP Hackathon.
Ryan James and fellow classmates in AP
From left: Kelsey Namba, Sarah Leventhal, Serena Tam. Taken in Lerner after a whole evening of working on a lab assignment in Advanced Programming.
Beom Joon Baek with James Valentini during Freshman year.
The Operating Systems group Maya, Ecenaz, and Ahmed grinding hard for OS projects.
Daniel Hanoch (in black jacket) and co-workers at the Earth Institute.
Danny Parrill at the virtual Columbia GS Honor Society induction.
Dorothee Grant and Serena Killion in class when Professor Sethumadhavan took attendance by having the students send a photo of them holding up their UNIs.
Daniel Halmos presenting the architecture for a group project in Topics for Robotic Learning.
Left to right: Cesar Ramos Medina, Iliana Cantu, Ecenaz Ozmen, Yefri Gaitan, Daniel Garces Botero at Google Games. Where they won the most energetic table. Go red team!
Six papers from CS researchers were accepted to the 16th conference of the European Chapter of the Association for Computational Linguistics (EACL). As the flagship European conference in the field of computational linguistics, EACL welcomes European and international researchers covering a broad spectrum of research areas that are concerned with computational approaches to natural language.
Below are brief descriptions and links to the papers.
This paper presents a new clustering paradigm for news streams, where clusters have a one-to-one correspondence with real-world events (for example, the Suez canal blockage). An important aspect of this problem is that the number of clusters is unknown and varies with time (new events occur and old events cease to be of relevance). The proposed paradigm follows a pipeline approach – where representations are built for each new article, comparisons are made with existing clusters to pick the most compatible one, and finally, a clustering decision is produced.
A surprising observation from this work is that contextual embeddings (from models like BERT), in contrast to their overwhelming success in many NLP problems, achieve sub-par performance by themselves on this clustering problem. However, when combined with other representations (like TF-IDF and timestamps) and fine-tuned with task-specific augmentations, they achieve new state-of-the-art performance. Another interesting observation is that the widely reported B-Cubed metrics are biased towards large clusters and hence don’t capture cluster fragmentation on smaller clusters as well. Since clusters corresponding to emerging events are small and errors made on such clusters are highly undesirable, the authors suggest using an additional metric CEAF-e to evaluate models for this task.
Segmenting Subtitles for Correcting ASR Segmentation Errors David Wan Columbia University, Chris Kedzie Columbia University, Faisal Ladakh Columbia University, Elsbeth Turcan Columbia University, Petra Galuszkova University of Maryland, Elena Zotkina University of Maryland, Zhengping Jiang Columbia University, Peter Bell University of Edinburgh, and Kathleen McKeown Columbia University
For the task of spoken language translation, the usual approach is to have a pipeline consisting of Automatic Speech Recognition (ASR) that transforms audio files into words and utterances in the original language and a Machine Translation (MT) that translate the utterances into the target language. However this setup may suffer from input-output mismatches: ASR segments utterances by acoustic information such as pauses, and thus may produce run-on sentences or sentence fragments, but MT is usually trained on proper sentences without such issues and may not perform well under such setting. This paper proposes the use of an intermediate model to segment utterances into sentences to improve performance in MT as well as other downstream tasks.
One crucial problem for developing such models is the lack of suitable training data for segmentation, especially when the languages involved are low-resourced. To this end, this paper also proposes a way to use subtitles dataset as proxy speech data as well as creating synthetic acoustic utterances that mimic common ASR errors for the model to learn to fix. Using a simple neural tagging model, the authors of this paper show improvement over the baseline ASR segmentation on MT for Lithuanian, Bulgarian, and Farisi. A surprising finding is that the segmentation model most improves the translation quality of more syntactically complex segments.
It has been well-documented for several languages that human interlocutors tend to adapt their linguistic productions to become more similar to each other. This behavior, known as entrainment, affects lexical choice as well, both with regard to specific words, such as referring expressions, and overall style.
Lexical entrainment is the behavior that causes the words that speakers use in a conversation to become more similar over time. Entrainment more broadly is a human behavior causing interlocutors to adapt to each other to become more similar. Its effects are measurable but entrainment itself is not a measure.
This paper offers the first investigation of such lexical entrainment in Hebrew.
The analysis of Hebrew speakers interacting in a Map Task, a popular experimental setup, provides rich evidence of lexical entrainment. No clear pattern of differences is found between speaker pairs by the combination of their genders, nor between speakers by their individual gender. However, speakers in a position of less power are found to entrain more than those with greater power, which matches theoretical accounts.
Overall, the results mostly accord with those for American English. There is, however, a surprising lack of entrainment on a list of hedge words that were previously found to be highly entrained in English. This might be due to cultural differences between American and Israeli speakers that render adoption of a more tentative style less appropriate in the Hebrew context.
Entity-level Factual Consistency of Abstractive Text Summarization Feng Nan Amazon Web Services, Ramesh Nallapati Amazon Web Services, Zhiguo Wang Amazon Web Services, Cicero Nogueira dos Santos Amazon Web Services, Henghui Zhu Amazon Web Services, Dejiao Zhang Amazon Web Services, Kathleen McKeown Amazon Web Services & Columbia University, Bing Xiang Amazon Web Services
A key challenge for abstractive summarization is ensuring factual consistency of the generated summary with respect to the original document. For example, state-of-the-art models trained on existing datasets exhibit entity hallucination, generating names of entities that are not present in the source document.
The paper proposes a set of new metrics to quantify the entity-level factual consistency of generated summaries and shows that the entity hallucination problem can be alleviated by simply filtering the training data. In addition, the paper introduces a summary-worthy entity classification task to the training process as well as a joint entity and summary generation approach, which yields further improvements in entity-level metrics.
Detecting arguments in online interactions is useful to understand how conflicts arise and get resolved. Users often use figurative language, such as sarcasm, either as persuasive devices or to attack the opponent by an ad hominem argument. To further our understanding of the role of sarcasm in shaping the disagreement space, the paper presents a thorough experimental setup using a corpus annotated with both argumentative moves (agree/disagree) and sarcasm. The research exploits joint modeling in terms of (a) applying discrete features that are useful in detecting sarcasm to the task of argumentative relation classification (agree/disagree/none), and (b) multitask learning for argumentative relation classification and sarcasm detection using deep learning architectures (e.g., dual Long ShortTerm Memory (LSTM) with hierarchical attention and Transformer-based architectures). The paper shows that modeling sarcasm improves the argumentative relation classification task (agree/disagree/none) in all setups.
Ideological attitudes and stances are often expressed through subtle meanings of words and phrases. Understanding these connotations is critical to recognize the cultural and emotional perspectives of the speaker. In this paper, the researchers use distant labeling to create a new lexical resource representing connotation aspects for nouns and adjectives. Their analysis shows that it aligns well with human judgments. Additionally, they present a method for creating lexical representations that capture connotations within the embedding space and show that using the embeddings provides a statistically significant improvement on the task of stance detection when data is limited.
Noah Huber-Feely (CS) and Alex Paskov (APAM), valedictorian and salutatorian of Columbia Engineering’s undergraduate class of 2021, have earned the School’s highest academic honors.
Jason Nieh has been named a Guggenheim Fellow for this year. Guggenheim Fellowships are intended for individuals who have already demonstrated exceptional capacity for productive scholarship or exceptional creative ability in the arts.
Alfred V. Aho, Lawrence Gussman Professor Emeritus of Computer Science, has won the 2020 Association for Computing Machinery (ACM) A.M. Turing Award, known informally as the “Nobel Prize of computing.” Professor Aho is being recognized for fundamental algorithms and theory underlying programming language implementation and for synthesizing these results and those of others in highly influential books that educated generations of computer scientists.
Assistant Professor Carl Vondrick has won the National Science Foundation’s (NSF) Faculty Early Career Development award for his proposal program to develop machine perception systems that robustly detect and track objects even when they disappear from sight, thereby enabling machines to build spatial awareness of their surroundings.
Samantha John (SEAS ’09) took her first computer science class and started to love programming during her senior year. She launched Hopscotch, an app that teaches kids how to code by building games, in 2013 and turned her passion for programming into a profitable business.
The CS@CU MS Bridge Program in Computer Science equips students from non-computer science backgrounds with the skills and knowledge necessary to build careers in technology.
The program is designed as a year of rigorous “bridge” coursework composed of introductory CS courses. This transition prepares students for a seamless entrance into Columbia Engineering’s MS program in Computer Science after the bridge year.
Each student takes a different path through the program and most of the first cohort is done with their bridge year. Below are some of their experiences from the past year.
Ethan Garry BA Mathematics and Statistics, University of Rochester 2016 Associate of the Society of Actuaries in 2019
I tried learning from online resources for several months before applying to the MS Bridge program. CS is a demanding discipline and in order to rapidly evolve my skills, I needed an intense academic setting that Columbia has provided for me. I took four bridge courses over the past year and started the MS this spring. The four bridge courses were excellent preparation for my current coursework and they also provided sufficient knowledge to land a front-end development internship with SuperWorld (a virtual real estate marketplace).
After the program, I hope to either become a front-end web developer or a full-stack web developer for a small tech company focused on helping local businesses.
Muhan Liang MS Strategic Communication, Columbia University School of Professional Studies 2020 BS Communications Boston University 2018
I come from a communications background and I didn’t have any former knowledge of computer science. The MS Bridge program has provided a valuable and high-quality experience in gaining all the necessary foundational skills in computer science.
I am really excited about transitioning to the master’s program this semester!
Nand Chandravadia BS Neuroscience, University of California, Los Angeles 2017
I have done research on brain-computer interfaces that piqued my interest in the intersection of neuroscience and computer science. This pushed me to apply to the MS Bridge program to further my knowledge in CS and gain the necessary acumen and technical skills to pursue my research interests. The most important thing I have learned is to view the world from the perspective of a computer scientist. Especially when trying to tackle a research question, a computational mindset can be very helpful in distilling the essence of a question and coming up with a tractable solution (if one exists).
After the MS program, I plan on pursuing a PhD and I am very excited to take the knowledge learned from inside the classroom and apply it to the outside world.
Krystal Briggs BA Speech-Language Pathology and Linguistics, Brooklyn College Honors Academy 2017
I feel confident that Columbia’s MS Bridge Program will prepare me for a career in medical tech. The program provides exceptional academic advising with professors who truly care about the success of their students.
Once I reach the master’s portion of the program I look forward to specializing in bioinformatics and I plan to further extend my education and work towards bridging the gap in healthcare amongst underrepresented communities.
ZiJia Chen BA Liberal Arts, The New School 2020
I have started businesses where I created websites and applications but I realized that to truly make these startups successful I needed to learn more about CS and build my technical skills. I want to integrate what I will learn from the MS program with my design background to create interdisciplinary innovation and apply it to entrepreneurship.
I plan to take advantage of my time at Columbia to learn as much as I can, build relationships with classmates and professors, and work with Columbia’s Startup Lab to create something that will help humanity.
Matthew Duran BA Economics and Mathematics, School of General Studies of Columbia University 2020 Former 3B with the New York Yankees
Now that I have a good theoretical and programming foundation in computer science, I plan to complete the software systems track which will let me dive deeper into the core of modern computing and build systems that everyone uses. I will also take electives such as artificial intelligence and machine learning.
Honestly, I’m excited about the rest of the program because the possibilities are endless.
Lawrence Lai BS Accounting, SUNY Binghamton 2011
Aside from doing all the coursework and studying, I have made time to participate in a research project and join a couple of hackathons. These are some of the great opportunities offered by the school.
Overall, the program has been challenging and rewarding at the same time. I look forward to all the exciting work that I will be doing in grad school and beyond.
Shuran Song and Carl Vondrick are among the awardees chosen for their artificial intelligence (AI) research. The program aims to use AI for societal good.
The Distinguished Lecture series brings computer scientists to Columbia to discuss current issues and research that are affecting their particular fields. This year, eight experts covered topics ranging from machine learning, human-computer interaction, neural language models, law and public policy, psychology, and computer architecture.
Below are a couple of the lectures from prominent faculty from universities across the country.
The award recognizes Moti Yung’s significant contributions to the field of cryptography, specifically, coinventing malicious cryptography and pioneering contributions to distributed cryptosystems.
Research from the department was accepted to the 35th AAAI Conference on Artificial Intelligence. The conference promotes research in artificial intelligence (AI) and scientific exchange among AI researchers, practitioners, scientists, and engineers in affiliated disciplines.
One of the most exciting applications of modern artificial intelligence is to automatically discover scientific laws from experimental data. This is not a trivial problem as it involves searching for a complex mathematical relationship over a large set of explanatory variables and operators that can be combined in an infinite number of ways. Inspired by the incredible success of deep learning in computer vision, the authors tackle this problem by adapting various successful network architectures into the symbolic law discovery pipeline. The novelty of this new approach is in (1) encoding the input data as an image with super-resolution, (2) developing an appropriate deep network pipeline, and (3) predicting the importance of each mathematical operator from the relationship image. This allowed to prior the exponentially large search with the predicted importance of the symbolic operators, which can significantly accelerate the discovery process.
The model was then applied to a variety of plausible relationships—both simulated and from physics and mathematics domains—involving different dimensions and constituents. The authors show that their model is able to identify the underlying operators from data, achieving a high accuracy and AUC (91% and 0.96 on average resp.) for systems with as many as ten independent variables. Their method significantly outperforms the current state of the art in terms of data fitting (R^2), discovery rate (recovering the true relationship), and succinctness (output formula complexity). The discovered equations can be seen as first drafts of scientific laws that can be helpful to the scientists for (1) hypothesis building, and (2) understanding the complex underlying structure of the studied phenomena. This novel approach holds a real promise to help speed up the rate of scientific discovery.
One of the most common methods for policy learning used throughout the empirical sciences is the use of randomization of the treatment assignment. This method is considered the gold standard within many disciplines and can be traced back, at least, to Fisher (Fisher 1935) and Neyman (Neyman 1923). Whenever human subjects are at the center of the experiment, unfortunately, issues of non-compliance arise. Namely, subjects do not necessarily follow the experimental protocol and end up doing what they want. It is well-understood that under such conditions, unobserved confounding bias will emerge. For instance, subjects who did not comply with the treatment assignment may be precisely those who would have responded adversely to the treatment. Therefore, the actual causal effects of the treatment, when it is applied uniformly to the population, might be substantially less effective than the data reveals. Moreover, since one does not observe how subjects decide/respond to the realized treatment, the actual treatment effects are not uniquely computably from the collected data, called non-identifiable.
Robins (1989) and Manski (1990) derived the first informative bounds over the causal effects from studies with imperfect compliance under a set of non-parametric assumptions called instrumental variables (IV). In their seminal work, Balke and Pearl (1994a, 1997) improved earlier results by employing an algebraic method to derive analytic expressions of the causal bounds, which are provably optimal. However, this approach assumes the primary outcome to be discrete and finite. Solving such a program could be intractable when high-dimensional context variables are present.
This paper presents novel non-parametric methods to bound causal effects on the continuous outcome from studies with imperfect compliance. These methods could be generalized to settings with a high-dimensional context. Perhaps surprisingly, this paper introduced a latent data representation that could characterize all constraints on the observational and interventional distributions implied by IV assumptions, even when the primary outcome is continuous. Such representation allows one to reduce the original bounding problem to a series of linear programs. Solve these programs, therefore, leads to tight causal bounds.
Learning causal effects from observational data is a pervasive challenge found throughout the data-intensive sciences. General methods of determining the identifiability of causal effect from a combination of observational data and causal knowledge about the underlying system have been well-understood in theory. In practice, however, there are still challenges to estimating identifiable causal functionals from finite samples. Recently, a novel approach, named double/debiased machine learning (DML) (Chernozhukov et al. 2018), has been proposed to learn parameters leveraging modern machine learning techniques, which are both robust to model misspecification (‘doubly robust’) and slow convergence (‘debiased’). Still, DML has only been used for causal estimation in settings when the back-door condition (also known as conditional ignorability) holds.
This paper aims to bridge this gap by developing a general class of estimators for any identifiable causal functionals that exhibit robustness properties of DML estimators, which the authors called ‘DML-ID.’ In particular, they provide a complete procedure for deriving an essential ingredient of the DML estimator called an influence function (IF) and construct a general class of estimators based on the IF. This means that one can estimate any causal functional and enjoy two robustness properties, doubly robustness and debiasedness.
The prevailing framework for solving referring expression grounding is based on a two-stage process: 1) detecting proposals with an object detector and 2) grounding the referent to one of the proposals. Existing two-stage solutions mostly focus on the grounding step, which aims to align the expressions with the proposals.
In this paper, the researchers argue that these methods overlook an obvious mismatch between the roles of proposals in the two stages: they generate proposals solely based on the detection confidence (i.e., expression-agnostic), hoping that the proposals contain all right instances in the expression (i.e., expression-aware). Due to this mismatch, current two-stage methods suffer from a severe performance drop between detected and ground-truth proposals.
The paper proposes Ref-NMS, which is the first method to yield expression-aware proposals at the first stage. Ref-NMS regards all nouns in the expression as critical objects, and introduces a lightweight module to predict a score for aligning each box with a critical object. These scores can guide the NMS operation to filter out the boxes irrelevant to the expression, increasing the recall of critical objects, resulting in a significantly improved grounding performance.
Since RefNMS is agnostic to the grounding step, it can be easily integrated into any state-of-the-art two-stage method. Extensive ablation studies on several backbones, benchmarks, and tasks consistently demonstrate the superiority of Ref-NMS. Codes are available at: https://github.com/ChopinSharp/ref-nms.
CS researchers are among the recipients of the inaugural awards from the Columbia Center of Artificial Intelligence Technology (CAIT). Amazon is providing $5 million in funding over five years to support research, education, and outreach programs.
Jihye Kwon, a computer engineering PhD student, talks about her research projects and what it took to win a Best Paper award.
Jihye Kwon
What drew you to computer engineering, specifically the application of machine learning to computer-aided design? What questions or issues do you hope to answer?
I was attracted to the concept of a computer: a machine that performs calculations. I found it very interesting how modern computers evolved from executing one instruction at a time to executing many instructions simultaneously by exploiting multiple levels of parallelism. Still, various challenges remained, or newly arose, so I dreamed about designing a brand-new computer system. That is what I had in mind when coming to Columbia.
At the beginning of my PhD, I experimented and learned how to design the core parts of special-purpose computers, using computer-aided design tools. I also explored machine learning from both theoretical and practical perspectives. These activities led me to work on my current research problems.
In advanced computer-aided design of computer systems, computers solve many complex optimization problems in steps to generate a final design. They do so as guided by the designers via means of the configurable ‘knobs’. My focus is on the designers’ work.
For a target system, designers run the computer-aided design tools repeatedly with the many different knob configurations until the tools output final designs with optimal or desired properties, e.g., in timing, area, and power. I wondered if machines can learn, from designers’ previous work, how to configure the knobs to optimize a new target system. Can designers virtually collaborate across time and tasks through the machine learning models? These are the main questions that I hope to answer.
Could you talk about your research and how you collaborated with other groups? Was this something you considered when applying to Columbia – that there are opportunities to do multi-disciplinary work?
When I was applying to Columbia, I wished I could have collaboration opportunities to study and work in the interdisciplinary research communities at the center of New York City. I wanted to explore applications of computer science in different areas to eventually gain insight and inspiration for my own research, which is centered at computer engineering.
Fortunately, these were realized as I worked with my advisor, Professor Luca Carloni. I was invited to join the project “Energy Efficient Computing with Chip-Based Photonics”, which is a part of a large initiative supported by the government and industry. In this project, I worked closely with Lightwave Research Laboratory in Electrical Engineering on a new optical computing system. We proposed the concept of a next-generation computing system that is co-designed with silicon photonics and electronic circuitry, in order to overcome the fundamental and physical limitations of today’s computers.
Another project on optical communication was initiated from a student project that I mentored in my advisor’s class, Embedded Scalable Platforms. This project investigated the use of photonic switches in optically-connected memory systems for deep learning applications.
Outside Columbia, I have also collaborated with researchers at IBM TJ Watson Research Center via my summer internships on the project of auto-tuning computer-aided design flows for commercial supercomputers. All these collaborations opened new horizons for me.
You won the MLCAD 2020 Best Paper award for your research, can you talk about your process – how did the research come about? How long did it take you to complete the work? What were the things you had to overcome?
In this work, I proposed a novel machine learning approach for computer-aided design optimization of hardware accelerators. I wanted to address this problem because it is computationally very expensive to explore the entire optimization space. It took me about one year to complete the work. One of the biggest difficulties I faced was the limited availability of the data for applying machine learning to the problem.
Then, I found out that transfer learning has been recently successfully applied in other areas with limited data. In transfer learning, a model trained for a related problem (e.g., natural image recognition) is transferred to aid the machine learning for the target problem (e.g., face recognition). Hence, I tried to apply transfer learning to my research problem. I trained a neural network model for a different accelerator design, and transferred the model to predict the design properties of a target accelerator.
However, the transferred model did not perform well in this case. I realized that due to the diverse characteristics of the accelerators, I needed to distinguish which piece of the source information should be transferred. Based on this intuition, I constructed a series of new models, and eventually, proposed one with promising performance. While it was a long process of building new models without knowing the answers, my advisor greatly encouraged me in our discussions to keep moving forward, and it was very rewarding in the end.
Looking back, how have you grown as a researcher and a person?
Besides expanding my problem-solving capabilities and technical skills, I have grown to be a better presenter and communicator. One of the tasks of a researcher is to explain one’s work to various groups and different types of audiences. I had a number of opportunities to present my work at academic conferences, seminars at companies, lightning talks, and annual project reviews. Initially, I struggled to meet the audience’s interests whose expertise spans a diverse range of areas and levels. Through those opportunities, I have received very helpful feedback, I have tried to ask myself questions from other people’s perspectives and progressively learned to keep a good balance between abstraction and elaboration.
Also, by interacting with a lot of students with heterogeneous backgrounds in the classes I TA’ed, I have learned to understand what their questions mean and where they come from. Based on that, I tried to adjust my answers to have more relatable conversations. From those conversations, sometimes the students found the topics very interesting, and sometimes I learned something new from them. It was such a great pleasure to inspire others and to be inspired. I think those experiences have made me a better researcher and person.
In Fall 2017, I received an invitation from WiCS’ president, Julia Di, and was impressed by the passionate and caring board members working on the common goal of supporting the advancement of womxn in computer science. In my second year I launched the WiCS Lightning Talks for students with research experience to share their work and stories. The goal was for young students to get to know more about research and demystify the process.
I am one of the few women at Columbia in my research area of computer engineering and would like to contribute to inspiring the next generation to join us.
What was the highlight of your time at Columbia?
Every moment was special for me. Some of the highlights were during happy hour with members of the fishbowl. The fishbowl is a large office occupied by the majority of PhD students in computer engineering. We call it the fishbowl, because it is surrounded by large windows and students inside look like small fishes. Once, my colleagues talked about their memories of old computers that I had never seen. I enjoyed imagining the machines from their descriptions, and thinking about different types and generations of computers.
What is your advice to students on how to navigate their time at Columbia?
Explore, experience, and exploit. There are recommended lists of classes, activities, and companies, depending on your track and interests, but no one is exactly like you. There is such a great variety of opportunities and resources at Columbia and in New York City. I hope you can spend enough time exploring them and get involved in many ways before determining your academic and career goals.
Is there anything else that you think people should know?
Columbia is beautiful in the snow! It gets pretty windy in the winter, so please be aware if you are coming from warmer places. There are many places where you can study but Avery Library is my favorite library on campus. If you have any questions or opinions on this Q&A story, please feel free to drop me a line!
The University of San Diego has named Maritza Johnson (PhD ’12) as the director of the new center that will address issues concerning big data and artificial intelligence, and their social implications — ethics, privacy, and fairness.
The Computer Science Department Student Services Team is excited to continue The How She Made It Series, a new Columbia University Computer Science Department initiative, which features fabulous womxn-identifying industry professionals and prestigious speakers!
How She Made It interviews diverse, trailblazing professionals, and allows students to learn more about how the speakers launched into industry, overcame obstacles, and became the successful superstars they are today!
Click on the thumbnails below to view the video recording of the presentations.
The computer science department welcomes two new faculty members. These new CS faculty are facilitating research and learning in natural language processing and quantum theory.
Zhou Yu Assistant Professor PhD Computer Science, Carnegie Mellon University 2017 BS Computer Science and BA Linguistics, Chu Kochen Honors College at Zhejiang University 2011
Zhou Yu designs algorithms for real-time intelligent interactive systems that coordinate with user actions that are beyond spoken languages, including non-verbal behaviors to achieve effective and natural communications. In particular, she optimizes human-machine communication via studies of multimodal sensing and analysis, speech and natural language processing, machine learning and human-computer interaction.
Yu aims to bring together all the areas above to design, implement and deploy end-to-end real-time interactive intelligent systems that are able to plan globally considering interaction history and current user actions to achieve better user experience and task performance.
Prior to Columbia, Yu was an Assistant Professor in the Computer Science Department at University of California, Davis, where she directed the Davis NLP Lab. She is the winner of a Google GCP Credit Award and has been named an Amazon Alexa Prize Champion and a 2018 Forbes 30 Under 30 in Science.
Yu is teaching Dialog Systems this Spring and is looking for PhD, master, and undergrad students with a background in natural language processing to join her research projects.
Henry Yuen Assistant Professor PhD Computer Science, Massachusetts Institute of Technology (MIT) 2016 BA Mathematics, University of Southern California 2010
Henry Yuen is a theoretical computer scientist whose goal is to understand the fundamental principles of computation in a universe governed by quantum physics (such as ours). He studies questions at the interface of quantum information theory, computational complexity theory, and cryptography.
He has made a number of contributions to the theory of quantum multiprover interactive proofs, including the discovery that such interactive proofs can verify solutions to uncomputable problems. Yuen also works on quantum cryptography; some of his contributions include designing protocols for infinite randomness expansion using untrusted quantum hardware.
Yuen was a postdoctoral associate at UC Berkeley from 2016-2018 and an assistant professor in Computer Science and Mathematics at the University of Toronto between 2018–2020. He is the recipient of a Simons-Berkeley Research Fellowship and a Google Quantum Research Award.
This semester he is teaching Introduction to Quantum Computing that is aimed at graduate students and advanced undergraduates who are looking to learn about the basics of quantum computation. For his research projects, he is currently looking for talented students to work on fundamental questions that cut across mathematics, computer science, and physics.
Six papers from CS researchers were accepted to the ACM-SIAM Symposium on Discrete Algorithms (SODA ’21). The conference focuses on efficient algorithms and data structures for discrete problems.
The paper studies the approximate nearest neighbor problem in high-dimensional spaces. Namely, the algorithmic task is to produce a data structure for answering queries of the following form: what is the (approximately) closest point to a query q within a dataset P. This problem is a fundamental task in modern data analysis, and the paper gives new and improved approximations for one of the most commonly studied metric spaces, the ℓp spaces, as well as generalized versions of the Hamming metric.
The surprising aspect of this work is the data-dependent decomposition schemes for high dimensional vectors; while LSH partitions are well-known for ℓp spaces when p ∈ [1, 2], these fail for higher values of p. The present work shows how to build decompositions that are not LSH, but nevertheless solve the approximate nearest neighbor problem efficiently.
In the graph dynamic stream model, a set of n vertices is given in advance, and a stream of edge insertions and deletions is observed by a player, which can use only a very small memory to store his impressions of the stream. Once the stream is exhausted, the player is required to answer a query (known in advance) regarding the observed graph. Algorithms in this model usually use linear sketching techniques. Most notably, Ahn, Guha, and McGregor ’12 showed how to compute a spanning tree using \tilde{O}(n) space, while Kapralov et al 14, devised an algorithm computing spectral sparsifier.
The question of computing a spanner, or even more generally shortest path distance estimation is poorly understood in this model. Previous multi-pass algorithms (which are allowed to observe the stream several times) were devised, however, no single-pass algorithm was known. This paper provides the first single-pass algorithm that uses \tilde{O}(n) space while returning \tilde{O}(n^{\frac23}) estimation of all distances. Even though this distortion is very large, the authors conjecture that it is close to optimal.
The Fréchet distance between two curves P and Q is often described by the man-dog analogy, in which a man is walking along P, holding a leash connected to its dog who walks along Q, and the goal is to minimize the length of the leash that allows them to fully traverse their curves without backtracking. The Fréchet distance is well studied and has numerous applications. Given two curves with n points, a simple dynamic programming could be used to compute the Fréchet distance between them in O(n^2) time. However, under standard complexity assumptions (SETH), there is no strongly subquadratic algorithm computing the Fréchet distance, even if the solution may be approximated up to a factor of 3.
To overcome this quadratic barrier, this paper studies the question of distance oracle. Here a curve P is preprocessed in advance, such that given a query curve Q, the Fréchet distance between P and Q could be approximated up to 1+\epsilon factor in linear time. The authors constructed a distance oracle with an optimal tradeoff between approximation factor, storage space, and query time.
Surprisingly, when the length of the curve P is extensively large and its points can be observed only once in a streaming fashion, the authors constructed a distance oracle with the exact same parameters.
Think of a two-sided market with a bunch of “sellers,” such as Etsy sellers, Airbnb hosts, or employees, and a bunch of “buyers,” such as Etsy customers, Airbnb renters, or employers. A platform sits in the middle matching buyers and sellers, such as Airbnb. In order to maximize the platform’s value to the market’s participants, the gains from trade should be maximized — or the increased value in the market due to the matches.
Maximizing gains from trade using an algorithm that (1) aligns participants incentives, (2) ensures participants don’t regret participating, and (3) does not require the platform to lose money is known to be provably impossible even for one buyer, one seller, and one item. Further, as with questions of revenue maximization, the complexity suffers drastically as soon as a buyer is interested in more than one item.
This paper investigates the setting where a buyer is interested in many different items, each owned by a different seller, and gives the first guarantee for gains from trade in this setting. It provides an O(log n)-approximation to the optimal gains from trade subject to properties (1-3) using a combination of simple mechanisms–fixed posted pricings, buyer offering mechanisms, and a new “seller-adjusted posted price” mechanism which is surprisingly capable of earning far more gains from trade and the others in some instances.
In the trace reconstruction problem, an unknown source string is sent through a probabilistic deletion channel which independently deletes each bit with a certain deletion rate and concatenates the surviving bits, yielding a trace of the unknown string. The problem is to reconstruct the unknown string given access to its independent traces.
The main result of the paper is a polynomial-time algorithm for the trace reconstruction problem in the smoothed analysis model, where any “worst-case” string is perturbed and the goal is to reconstruct its perturbed version with high probability.
The researchers’ approach is based on reconstructing a string from the multiset of its short subwords and is quite different from previous algorithms for either the worst-case or average-case versions of the problem. The heart of the work is a new efficient procedure for reconstructing the multiset of all O(log n)-length subwords of any source string using its traces.
The paper gives a nearly-optimal algorithm for testing the uniformity of distributions supported on hypercubes under the subcube conditioning model, where one can draw samples from the unknown distribution after fixing a subset of variables. The key technical component is a natural notion of random restrictions for distributions over hypercubes, and a quantitative analysis of how such a restriction affects the mean vector of the distribution. Along the way, the researchers also considered the problem of mean testing with independent samples and provide a nearly-optimal algorithm.
A Columbia Engineering robot has learned to predict its partner robot’s future actions and goals based on just a few initial video frames. The study is part of a broader effort to endow robots with the ability to understand and anticipate the goals of other robots, purely from visual observations.
Dean Boyce's statement on amicus brief filed by President Bollinger
President Bollinger announced that Columbia University along with many other academic institutions (sixteen, including all Ivy League universities) filed an amicus brief in the U.S. District Court for the Eastern District of New York challenging the Executive Order regarding immigrants from seven designated countries and refugees. Among other things, the brief asserts that “safety and security concerns can be addressed in a manner that is consistent with the values America has always stood for, including the free flow of ideas and people across borders and the welcoming of immigrants to our universities.”
This recent action provides a moment for us to collectively reflect on our community within Columbia Engineering and the importance of our commitment to maintaining an open and welcoming community for all students, faculty, researchers and administrative staff. As a School of Engineering and Applied Science, we are fortunate to attract students and faculty from diverse backgrounds, from across the country, and from around the world. It is a great benefit to be able to gather engineers and scientists of so many different perspectives and talents – all with a commitment to learning, a focus on pushing the frontiers of knowledge and discovery, and with a passion for translating our work to impact humanity.
I am proud of our community, and wish to take this opportunity to reinforce our collective commitment to maintaining an open and collegial environment. We are fortunate to have the privilege to learn from one another, and to study, work, and live together in such a dynamic and vibrant place as Columbia.
Sincerely,
Mary C. Boyce
Dean of Engineering
Morris A. and Alma Schapiro Professor