Graduate students from the department have been selected to receive scholarships. The diverse group is a mix of those new to Columbia and students who have received fellowships for the year.
Google Fellowship
The Google PhD Fellowship Program was created to recognize outstanding graduate students doing exceptional and innovative research in areas relevant to computer science and related fields.
Yiru Chen
Yiru Chen is a fourth-year Ph.D. student who works with Associate Professor Eugene Wu. Her research interests are database systems, human-computer interaction, and data exploration. Her work focuses on improving database usability by automatically generating database interfaces for interactive data analysis.
Chen graduated from Peking University with a B.S. in computer science summa cum laude and a B.A. in Economics in 2018. She enjoys cycling and playing the violin whenever she has free time.
NSF Graduate Research Fellowship Program (GRFP)
The GRFP is a five-year fellowship that recognizes and supports outstanding graduate students in NSF-supported STEM disciplines who are pursuing research-based master’s and doctoral degrees.
Philippe Chlenski
Philippe Chlenski is interested in developing and applying computational techniques to biological problems, particularly machine learning for microbial dynamics. He is a second-year PhD student in the Pe’er lab. Prior to Columbia, he worked for two years at the Fellowship for Interpretation of Genomes at the Argonne National Lab.
Chlenski graduated in 2018 from Yale University with a Bachelor’s degree in mathematics and philosophy. He also holds an Associate’s degree in liberal arts from Deep Springs College.
Sam Fereidooni
Sam Fereidooni is interested in investigating semantic representations through the lens of both cognitive neuroscience and natural language processing. He particularly hopes that the eventual findings from his work will lead to ameliorated treatments for those who suffer from language processing and production disorders. He is a first-year PhD student in the Theory group, and he is advised by Professor Christos Papadimitriou.
Fereidooni graduated in 2021 from Yale University with a B.S. in Cognitive Science, and a B.S. in Statistics and Data Science. Sam’s undergraduate studies were supported by the Questbridge Foundation National College Match scholarship, the Richter Undergraduate Research fellowship, and the Yale Club of New York City Charles S. Guggenheimer scholarship.
Shashaank N
Shashaank N is a first-year PhD student who will be advised by assistant professor David Knowles. His research interests are in computational genomics and neuroscience, with a focus on auditory processing disorders in the brain.
Shashaank recently graduated with an MS in Computer Science from Columbia University in 2021. He completed a BS in Interdisciplinary Studies from Western Kentucky University (WKU) in 2019 and received the Scholar of the College academic award.
Meghna Pancholi
Meghna Pancholi is a second-year PhD student advised by Associate Professor Simha Sethumadhavan. She is interested in cloud computing, systems security, and microservices. Before Columbia, Meghna was an undergraduate researcher at Cornell University where she worked on improving the performance of microservices applications with machine learning techniques.
Meghna graduated from Cornell University in 2020 with a BS in Computer Science.
Clayton Sanford
Clayton Sanford is a third-year PhD student working with Professors Rocco Servedio and Daniel Hsu on machine learning theory. The motivating goal of his research is to understand mathematically why deep learning performs so well in practice. Clayton’s work on the approximation capabilities of neural networks has been published at the COLT 2021 conference. He is a member of the CS Theory Group.
Clayton received an ScB in Applied Math and Computer Science with honors from Brown University in 2018.
Sky Wang
Sky Wang is an incoming first-year PhD student set to work with Assistant Professors Zhou Yu and Smaranda Muresan. His work focuses on natural language processing and he is interested in leveraging computational methods to understand social aspects of language and to use such insights in creating more effective and more equitable language technologies. He is particularly interested in the areas of situated dialogue systems, computational social science, and cultural analytics.
Wang graduated in 2020 from the University of Michigan with a B.S.E in Computer Science. He is a 2021 recipient of the University of Michigan’s EECS Undergraduate Outstanding Research Award and also received an honorable mention for the Computing Research Association Outstanding Undergraduate Research Award in 2021. He received a Best Poster award from the University of Michigan AI Symposium in 2018 and was recognized as a finalist in the NASA Goddard Space Flight Center Intern Research Fair in 2018.
Joseph Zuckerman
Joseph Zuckerman is a second-year PhD student in computer science at Columbia University, where he works in the System-Level Design group, advised by Professor Luca Carloni. His research interests include architectures, runtime management, and agile design methodologies for many-accelerator systems-on-chip.
Zuckerman contributes as one of the main developers to ESP, an open-source research platform for heterogeneous system-on-chip design. In 2019, he completed his S.B in electrical engineering at Harvard University, during which he completed internships at NVIDIA and the NASA Jet Propulsion Lab.
SEAS Fellowships
Columbia School of Engineering and Applied Sciences established the Presidential and SEAS fellowships to recruit outstanding students from around the world to pursue graduate studies at the school.
Blavatnik Fellow
Sebastian Salazar
Sebastian Salazar’s research interests include Machine Learning and Ethical AI. At Columbia, his work will be focused on counterfactual predictions and actionability of Machine Learning models. He is a first-year PhD student who will be working under the guidance of Ansaf Salleb-Aouissi.
Sebastian graduated magna cum laude from Columbia University in 2021 with a B.S. in Applied Physics.
Dean’s Fellows
Huy Ha
Huy Ha is an incoming first-year PhD student interested in computer vision, natural language processing, and robot learning. His research studies how embodied intelligence could combine information from different modalities (vision, language, interaction) to understand its environment, solve tasks, and assist people. He is advised by Assistant Professor Shuran Song and is a member of the Columbia Artificial Intelligence and Robotics (CAIR) lab.
Ha graduated in 2021with a BS in Computer Science from Columbia University. He was a Dean’s Fellow and received the Theodore Bashkow Award. He did research during the summer as a Bonomi Summer Scholar. During his free time, Ha likes to take photos, rock climb, bike, and train his two border collies for frisbee.
Yun-Yun Tsai
A first-year PhD student, Yun-Yun Tsai works with Professor Junfeng Yang. Her research interests are in security and artificial intelligence. In particular, she is interested in improving robustness over neural networks and machine learning (ML) algorithms so that they make fewer mistakes on malicious samples. She will work on research related to making AI applications less fragile against unusual inputs.
Tsai received a B.Sc. and M.Sc. degrees in computer science at National Tsing Hua University (NTHU) Taiwan in 2014 and 2018, respectively. Previously, she was advised by Professor Tsung-Yi Ho and Dr. Pin-Yu Chen from Trusted AI group, IBM Thomas J. Watson Research Center, NY USA.
Mudd Fellow
Anjali Das
Anjali Das is a first-year PhD student who works with Professors Itsik Pe’er and David Knowles. Her research interest is in developing and applying machine learning methods to problems in genomics. Specifically, she is interested in the genetics of neurological diseases.
Das graduated from the University of Chicago in June of 2020 with a BS in statistics and a minor in computer science. After graduating, she worked as a data scientist at UChicago’s Research Computing Center before joining Columbia.
Josh Alman, Toniann Pitassi, and Richard Zemel join the department. They will facilitate research and learning in CS theory, machine learning, and artificial intelligence.
Josh Alman
Assistant Professor
PhD Computer Science, Massachusetts Institute of Technology 2019
MS Computer Science, Stanford University 2016
BS Mathematics, Massachusetts Institute of Technology 2014
Josh Alman is a theoretical computer scientist who works broadly throughout algorithm design and complexity theory. His current research focuses on algorithms for fundamental algebraic problems. In particular, he is interested in how quickly one can multiply matrices and compute important linear transforms like Fourier transforms, and how to apply algebraic tools like these to new problems throughout computer science.
Alman joins the Theory Group and looks forward to working with students who have a background in theoretical computer science or mathematics on various projects. This Fall, he will teach a graduate class on algebraic techniques in algorithms and complexity.
Toniann Pitassi
Jeffrey L. and Brenda Bleustein Professor of Engineering
PhD Computer Science, University of Toronto 1992
MS Computer Science, Pennsylvania State University 1985
BS Chemistry & Computer Science, Pennsylvania State University 1985
Pitassi’s research area is computational complexity: what are the inherent limitations on the resources (time, space, randomness) required to solve fundamental computational problems? An important direction aimed at resolving the ultimate question of this type, the P versus NP question, is propositional proof complexity, which studies the difficulty of proving tautological statements in standard proof systems. She has also worked extensively in communication complexity which studies how much information must be communicated between two or more players in order to compute a joint function of their inputs. Her other research interests lie in the foundations of machine learning, particularly in the areas of privacy, fairness, and reproducibility.
Previously Pitassi was the Bell Research Chair at the University of Toronto, a Canadian Institute for Advanced Research research chair, and a research lead at the Schwartz-Reisman Institute for Technology and Society. She currently holds a five-year appointment as visiting professor at the Institute for Advanced Study. Pitassi is the 2021 EATCS Award recipient and was an invited speaker at the International Congress of Mathematicians
At Columbia, Pitassi joins the department’s Theory Group and the Machine Learning Group. She is excited to collaborate with new colleagues and graduate students at Columbia and to explore New York. Her hobbies and outside interests are constantly changing with the latest being stained glass.
Richard Zemel
Professor of Computer Science
PhD Computer Science, University of Toronto 1994
MS Computer Science, University of Toronto 1989
BA History and Science, Harvard University 1984
Richard Zemel’s research focuses on machine learning and artificial intelligence. He is also interested in natural intelligence, including neuroscience and cognitive science. His recent research targets learning with few labels and creating robust and controllable machine learning systems, which can transfer to a variety of tasks and domains. He also has a strong interest in algorithmic fairness.
Previously, Zemel was the NSERC Industrial Research Chair in Machine Learning at the University of Toronto, and co-founded and served as the Research Director of the Vector Institute for Artificial Intelligence. He is the recipient of an ONR Young Investigator Award, an NVIDIA Pioneer of AI Award, and is a Fellow of the Canadian Institute for Advanced Research.
This Fall, he will teach a course on Neural Networks and Deep Learning and will teach a seminar class on machine learning in the Spring term. Zemmel is looking for PhD students who are interested in machine learning to join his research group. In his spare time, he likes sports such as hockey, squash and biking, and eating.
Research from the department was accepted to the 34th Annual Conference on Learning Theory (COLT2021). The conference highlights research on the theoretical aspects of machine learning.
Below are the abstracts and links to the accepted papers.
Size and Depth Separation in Approximating Natural Functions with Neural Networks
Gal Vardi Weizmann Institute of Science, Daniel Reichman Worcester Polytechnic Institute, Toniann Pitassi Columbia University, Ohad Shamir Weizmann Institute of Science
When studying the expressive power of neural networks, a main challenge is to understand how the size and depth of the network affect its ability to approximate real functions. However, not all functions are interesting from a practical viewpoint: functions of interest usually have a polynomially bounded Lipschitz constant, and can be computed efficiently. We call functions that satisfy these conditions “benign” and explore the benefits of size and depth for approximation of benign functions with ReLU networks. As we show, this problem is more challenging than the corresponding problem for non-benign functions. We give complexity-theoretic barriers to showing depth-lower bounds: Proving existence of a benign function that cannot be approximated by polynomial-sized networks of depth 4 would settle longstanding open problems in computational complexity. It implies that beyond depth 4 there is a barrier to showing depth-separation for benign functions, even between networks of constant depth and networks of nonconstant depth. We also study size separation, namely, whether there are benign functions that can be approximated with networks of size O(s(d)), but not with networks of size O(s 0 (d)). We show a complexity-theoretic barrier to proving such results beyond size O(d log2 (d)), but also show an explicit benign function, that can be approximated with networks of size O(d) and not with networks of size o(d/ log d). For approximation in the L∞ sense we achieve such separation already between size O(d) and size o(d). Moreover, we show superpolynomial size lower bounds and barriers to such lower bounds, depending on the assumptions on the function. Our size-separation results rely on an analysis of size lower bounds for Boolean functions, which is of independent interest: We show linear size lower bounds for computing explicit Boolean functions (such as set disjointness) with neural networks and threshold circuits.
Learning sparse mixtures of permutations from noisy information
Rocco Servedio Columbia University, Anindya De University of Pennsylvania, Ryan O’Donnell Carnegie Mellon University
We study the problem of learning an unknown mixture of k permutations over n elements, given access to noisy samples drawn from the unknown mixture. We consider a range of different noise models, including natural variants of the “heat kernel” noise framework and the Mallows model. We give an algorithm which, for each of these noise models, learns the unknown mixture to high accuracy under mild assumptions and runs in n O(log k) time. Our approach is based on a new procedure that recovers an unknown mixture of permutations from noisy higher-order marginals.
Learning and testing junta distributions with subcube conditioning
Xi Chen Columbia University, Rajesh Jayaram Carnegie Mellon University, Amit Levi University of Waterloo, Erik Waingarten Stanford University
We study the problems of learning and testing junta distributions on {−1, 1} n with respect to the uniform distribution, where a distribution p is a k-junta if its probability mass function p(x) depends on a subset of at most k variables. The main contribution is an algorithm for finding relevant coordinates in a k-junta distribution with subcube conditioning Bhattacharyya and Chakraborty (2018); Canonne et al. (2019). We give two applications: • An algorithm for learning k-junta distributions with O˜(k/2 ) log n + O(2k/2 ) subcube conditioning queries, and • An algorithm for testing k-junta distributions with O˜((k + √ n)/2 ) subcube conditioning queries. All our algorithms are optimal up to poly-logarithmic factors. Our results show that subcube conditioning, as a natural model for accessing high-dimensional distributions, enables significant savings in learning and testing junta distributions compared to the standard sampling model. This addresses an open question posed by Aliakbarpour et al. (2016).
Survival of the strictest: Stable and unstable equilibria under regularized learning with partial information
Emmanouil Vasileios Vlatakis-Gkaragkounis Columbia University, Angeliki Giannou National Technical University of Athens, Panayotis Mertikopoulos Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LIG, 38000 Grenoble, France & Criteo AI Lab
In this paper, we examine the Nash equilibrium convergence properties of no-regret learning in general N-player games. For concreteness, we focus on the archetypal “follow the regularized leader” (FTRL) family of algorithms, and we consider the full spectrum of uncertainty that the players may encounter – from noisy, oracle-based feedback, to bandit, payoff-based information. In this general context, we establish a comprehensive equivalence between the stability of a Nash equilibrium and its support: a Nash equilibrium is stable and attracting with arbitrarily high probability if and only if it is strict (i.e., each equilibrium strategy has a unique best response). This equivalence extends existing continuous-time versions of the “folk theorem” of evolutionary game theory to a bona fide algorithmic learning setting, and it provides a clear refinement criterion for the prediction of the day-to-day behavior of no-regret learning in games.
Reconstructing weighted voting schemes from partial information about their power indices
Emmanouil Vasileios Vlatakis-Gkaragkounis Columbia University, Huck Bennett Columbia University, Anindya De Columbia University, Rocco Servedio Columbia University
A number of recent works (Goldberg, 2006; O’Donnell and Servedio, 2011; De et al., 2017, 2014) have considered the problem of approximately reconstructing an unknown weighted voting scheme given information about various sorts of “power indices” that characterize the level of control that individual voters have over the final outcome. In the language of theoretical computer science, this is the problem of approximating an unknown linear threshold function (LTF) over {−1, 1} n given some numerical measure (such as the function’s n “Chow parameters,” a.k.a. its degree-1 Fourier coefficients, or the vector of its n Shapley indices) of how much each of the n individual input variables affects the outcome of the function. In this paper we consider the problem of reconstructing an LTF given only partial information about its Chow parameters or Shapley indices; i.e. we are given only the Chow parameters or the Shapley indices corresponding to a subset S ⊆ [n] of the n input variables. A natural goal in this partial information setting is to find an LTF whose Chow parameters or Shapley indices corresponding to indices in S accurately match the given Chow parameters or Shapley indices of the unknown LTF. We refer to this as the Partial Inverse Power Index Problem. Our main results are a polynomial time algorithm for the (ε-approximate) Chow Parameters Partial Inverse Power Index Problem and a quasi-polynomial time algorithm for the (ε-approximate) Shapley Indices Partial Inverse Power Index Problem.
On the Approximation Power of Two-Layer Networks of Random ReLUs
Daniel Hsu Columbia University, Clayton H Sanford Columbia University, Rocco Servedio Columbia University, Emmanouil Vasileios Vlatakis-Gkaragkounis Columbia University
This paper considers the following question: how well can depth-two ReLU networks with randomly initialized bottom-level weights represent smooth functions? We give near-matching upper and lower-bounds for L2-approximation in terms of the Lipschitz constant, the desired accuracy, and the dimension of the problem, as well as similar results in terms of Sobolev norms. Our positive results employ tools from harmonic analysis and ridgelet representation theory, while our lower-bounds are based on (robust versions of) dimensionality arguments.
Weak learning convex sets under normal distributions
Anindya De Columbia University, Rocco Servedio Columbia University
This paper addresses the following natural question: can efficient algorithms weakly learn convex sets under normal distributions? Strong learnability of convex sets under normal distributions is well understood, with near-matching upper and lower bounds given in Klivans et al. (2008), but prior to the current work nothing seems to have been known about weak learning. We essentially answer this question, giving near-matching algorithms and lower bounds. For our positive result, we give a poly(n)-time algorithm that can weakly learn the class of convex sets to advantage Ω(1/ √ n) using only random examples drawn from the background Gaussian distribution. Our algorithm and analysis are based on a new “density increment” result for convex sets, which we prove using tools from isoperimetry. We also give an information-theoretic lower bound showing that O(log(n)/ √ n) advantage is best possible even for algorithms that are allowed to make poly(n) many membership queries.
Four papers from CS researchers were accepted to the 30th USENIX Security Symposium. The yearly conference highlights research on the security and privacy of computer systems and networks.
Below are the abstracts and links to the accepted papers.
Formally Verified Memory Protection for a Commodity Multiprocessor Hypervisor
Shih-Wei Li Columbia University, Xupeng Li Columbia University, Ronghui Gu Columbia University, Jason Nieh Columbia University, and John Zhuang Hui Columbia University
Hypervisors are widely deployed by cloud computing providers to support virtual machines, but their growing complexity poses a security risk, as large codebases contain many vulnerabilities. We present SeKVM, a layered Linux KVM hypervisor architecture that has been formally verified on multiprocessor hardware. Using layers, we isolate KVM’s trusted computing base into a small core such that only the core needs to be verified to ensure KVM’s security guarantees. Using layers, we model hardware features at different levels of abstraction tailored to each layer of software. Lower hypervisor layers that configure and control hardware are verified using a novel machine model that includes multiprocessor memory management hardware such as multi-level shared page tables, tagged TLBs, and a coherent cache hierarchy with cache bypass support. Higher hypervisor layers that build on the lower layers are then verified using a more abstract and simplified model, taking advantage of layer encapsulation to reduce proof burden. Furthermore, layers provide modularity to reduce verification effort across multiple implementation versions. We have retrofitted and verified multiple versions of KVM on Arm multiprocessor hardware, proving the correctness of the implementations and that they contain no vulnerabilities that can affect KVM’s security guarantees. Our work is the first machine-checked proof for a commodity hypervisor using multiprocessor memory management hardware. SeKVM requires only modest KVM modifications and incurs only modest performance overhead versus unmodified KVM on real application workloads.
Fine Grained Dataflow Tracking with Proximal Gradients
Gabriel Ryan Columbia University, Abhishek Shah Columbia University, Dongdong She Columbia University, Koustubha Bhat, Vrije Universiteit Amsterdam; Suman Jana Columbia University
Dataflow tracking with Dynamic Taint Analysis (DTA) is an important method in systems security with many applications, including exploit analysis, guided fuzzing, and side-channel information leak detection. However, DTA is fundamentally limited by the Boolean nature of taint labels, which provide no information about the significance of detected dataflows and lead to false positives/negatives on complex real world programs.
We introduce proximal gradient analysis (PGA), a novel, theoretically grounded approach that can track more accurate and fine-grained dataflow information. PGA uses proximal gradients, a generalization of gradients for non-differentiable functions, to precisely compose gradients over non-differentiable operations in programs. Composing gradients over programs eliminates many of the dataflow propagation errors that occur in DTA and provides richer information about how each measured dataflow effects a program.
We compare our prototype PGA implementation to three state of the art DTA implementations on 7 real-world programs. Our results show that PGA can improve the F1 accuracy of data flow tracking by up to 33% over taint tracking (20% on average) without introducing any significant overhead (< 5% on average). We further demonstrate the effectiveness of PGA by discovering 22 bugs (20 confirmed by developers) and 2 side-channel leaks, and identifying exploitable dataflows in 19 existing CVEs in the tested programs.
AdCube: WebVR Ad Fraud and Practical Confinement of Third-Party Ads
Hyunjoo Lee Korea Advanced Institute of Science and Technology, Jiyeon Lee Korea Advanced Institute of Science and Technology, Daejun Kim Korea Advanced Institute of Science and Technology, Suman Jana Columbia University, Insik Shin Korea Advanced Institute of Science and Technology, and Sooel Son Korea Advanced Institute of Science and Technology
Web technology has evolved to offer 360-degree immersive browsing experiences. This new technology, called WebVR, enables virtual reality by rendering a three-dimensional world on an HTML canvas. Unfortunately, there exists no browser-supported way of sharing this canvas between different parties. Assuming an abusive ad service provider who exploits this absence, we present four new ad fraud attack methods. Our user study demonstrates that the success rates of our attacks range from 88.23% to 100%, confirming their effectiveness. To mitigate the presented threats, we propose AdCube, which allows publishers to specify the behaviors of third-party ad code and enforce this specification. We show that AdCube is able to block the presented threats with a small page loading latency of 236 msec and a negligible frame-per-second (FPS) drop for nine WebVR official demo sites.
Cost-Aware Robust Tree Ensembles for Security Applications
Yizheng Chen Columbia University, Shiqi Wang Columbia University, Weifan Jiang Columbia University, Asaf Cidon Columbia University, and Suman Jana Columbia University
There are various costs for attackers to manipulate the features of security classifiers. The costs are asymmetric across features and to the directions of changes, which cannot be precisely captured by existing cost models based on Lp-norm robustness. In this paper, we utilize such domain knowledge to increase the attack cost of evading classifiers, specifically, tree ensemble models that are widely used by security tasks. We propose a new cost modeling method to capture the feature manipulation cost as constraint, and then we integrate the cost-driven constraint into the node construction process to train robust tree ensembles. During the training process, we use the constraint to find data points that are likely to be perturbed given the feature manipulation cost, and we use a new robust training algorithm to optimize the quality of the trees. Our cost-aware training method can be applied to different types of tree ensembles, including gradient boosted decision trees and random forest models. Using Twitter spam detection as the case study, our evaluation results show that we can increase the attack cost by 10.6X compared to the baseline. Moreover, our robust training method using cost-driven constraint can achieve higher accuracy, lower false positive rate, and stronger cost-aware robustness than the state-of-the-art training method using L∞-norm cost model. Our code is available at https://github.com/surrealyz/growtrees.
David Blei and Chong Wang were named winners of the Test of Time Award for Research at the 27th SIGKDD Conference on Knowledge Discovery and Data Mining. The duo was recognized for their 2011 paper, “Collaborative topic modeling for recommending scientific articles.”
Research from the department has been accepted to the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021).
Below are the abstracts and links to the accepted papers.
COVID-Fact: Fact Extraction and Verification of Real-World Claims on COVID-19 Pandemic
Arkadiy Saakyan, Tuhin Chakrabarty, and Smaranda Muresan
We introduce a FEVER-like dataset COVID-Fact of 4,086 claims concerning the COVID-19 pandemic. The dataset contains claims, evidence for the claims, and contradictory claims refuted by the evidence. Unlike previous approaches, we automatically detect true claims and their source articles and then generate counter-claims using automatic methods rather than employing human annotators. Along with our constructed resource, we formally present the task of identifying relevant evidence for the claims and verifying whether the evidence refutes or supports a given claim. In addition to scientific claims, our data contains simplified general claims from media sources, making it better suited for detecting general misinformation regarding COVID-19. Our experiments indicate that COVID-Fact will provide a challenging testbed for the development of new systems and our approach will reduce the costs of building domain-specific datasets for detecting misinformation.
Metaphor Generation with Conceptual Mappings
Kevin Stowe, Tuhin Chakrabarty, Nanyun Peng, Smaranda Muresan, and Iryna Gurevych
Generating metaphors is a difficult task as it requires understanding nuanced relationships between abstract concepts. In this paper, we aim to generate a metaphoric sentence given a literal expression by replacing relevant verbs. Guided by conceptual metaphor theory, we propose to control the generation process by encoding conceptual mappings between cognitive domains to generate meaningful metaphoric expressions. To achieve this, we develop two methods: 1) using FrameNet-based embeddings to learn mappings between domains and applying them at the lexical level (CM-Lex), and 2) deriving source/target pairs to train a controlled seq-to-seq generation model (CM-BART). We assess our methods through automatic and human evaluation for basic metaphoricity and conceptual metaphor presence. We show that the unsupervised CM-Lex model is competitive with recent deep learning metaphor generation systems, and CM-BART outperforms all other models both in automatic and human evaluations.
Weakly-Supervised Methods for Suicide Risk Assessment: Role of Related Domains
Chenghao Yang, Yudong Zhang, and Smaranda Muresan
Social media has become a valuable resource for the study of suicidal ideation and the assessment of suicide risk. Among social media platforms, Reddit has emerged as the most promising one due to its anonymity and its focus on topic-based communities (subreddits) that can be indicative of someone’s state of mind or interest regarding mental health disorders such as r/SuicideWatch, r/Anxiety, r/depression. A challenge for previous work on suicide risk assessment has been the small amount of labeled data. We propose an empirical investigation into several classes of weakly-supervised approaches, and show that using pseudo-labeling based on related issues around mental health (e.g., anxiety, depression) helps improve model performance for suicide risk assessment.
Improving Factual Consistency of Abstractive Summarization via Question Answering
Feng Nan, Cicero Nogueira Dos Santos, Henghui Zhu, Patrick Ng, Kathleen McKeown, Ramesh Nallapati, Dejiao Zhang, Zhiguo Wang, Andrew O. Arnold, and Bing Xiang
A commonly observed problem with the state-of-the art abstractive summarization models is that the generated summaries can be factually inconsistent with the input documents. The fact that automatic summarization may produce plausible-sounding yet inaccurate summaries is a major concern that limits its wide application. In this paper we present an approach to address factual consistency in summarization. We first propose an efficient automatic evaluation metric to measure factual consistency; next, we propose a novel learning algorithm that maximizes the proposed metric during model training. Through extensive experiments, we confirm that our method is effective in improving factual consistency and even overall quality of the summaries, as judged by both automatic metrics and human evaluation.
InfoSurgeon: Cross-Media Fine-grained Information Consistency Checking for Fake News Detection
Yi Fung, Christopher Thomas, Revanth Gangi Reddy, Sandeep Polisetty, Heng Ji, Shih-Fu Chang, Kathleen McKeown, Mohit Bansal, and Avi Sil
To defend against neural system-generated fake news, an effective mechanism is urgently needed. We contribute a novel benchmark for fake news detection at the knowledge element level, as well as a solution for this task which incorporates cross-media consistency checking to detect the fine-grained knowledge elements making news articles misinformative. Due to training data scarcity, we also formulate a novel data synthesis method by manipulating knowledge elements within the knowledge graph to generate noisy training data with specific, hard to detect, known inconsistencies. Our detection approach outperforms the state-of-the-art (up to 16.8% absolute accuracy gain), and more critically, yields fine-grained explanations.
Cross-language Sentence Selection via Data Augmentation and Rationale Training
Yanda Chen, Chris Kedzie, Suraj Nair, Petra Galuscakova, Rui Zhang, Douglas Oard, and Kathleen McKeown
This paper proposes an approach to crosslanguage sentence selection in a low-resource setting. It uses data augmentation and negative sampling techniques on noisy parallel sentence data to directly learn a cross-lingual embedding-based query relevance model. Results show that this approach performs as well as or better than multiple state-of-theart machine translation + monolingual retrieval systems trained on the same parallel data. Moreover, when a rationale training secondary objective is applied to encourage the model to match word alignment hints from a phrase-based statistical machine translation model, consistent improvements are seen across three language pairs (EnglishSomali, English-Swahili and English-Tagalog) over a variety of state-of-the-art baselines.
SocAoG: Incremental Graph Parsing for Social Relation Inference in Dialogues
Liang Qiu, Yuan Liang, Yizhou Zhao, Pan Lu, Baolin Peng, Zhou Yu, Ying Nian Wu, and Song-Chun Zhu
Inferring social relations from dialogues is vital for building emotionally intelligent robots to interpret human language better and act accordingly. We model the social network as an And-or Graph, named SocAoG, for the consistency of relations among a group and leveraging attributes as inference cues. Moreover, we formulate a sequential structure prediction task, and propose an α–β–γ strategy to incrementally parse SocAoG for the dynamic inference upon any incoming utterance: (i) an α process predicting attributes and relations conditioned on the semantics of dialogues, (ii) a β process updating the social relations based on related attributes, and (iii) a γ process updating individual’s attributes based on interpersonal social relations. Empirical results on DialogRE and MovieGraph show that our model infers social relations more accurately than the state-of-the-art methods. Moreover, the ablation study shows the three processes complement each other, and the case study demonstrates the dynamic relational inference.
HERALD: An Annotation Efficient Method to Detect User Disengagement in Social Conversations
Weixin Liang, Kai-Hui Liang, and Zhou Yu
Open-domain dialog systems have a usercentric goal: to provide humans with an engaging conversation experience. User engagement is one of the most important metrics for evaluating open-domain dialog systems, and could also be used as real-time feedback to benefit dialog policy learning. Existing work on detecting user disengagement typically requires hand-labeling many dialog samples. We propose HERALD, an efficient annotation framework that reframes the training data annotation process as a denoising problem. Specifically, instead of manually labeling training samples, we first use a set of labeling heuristics to label training samples automatically. We then denoise the weakly labeled data using the Shapley algorithm. Finally, we use the denoised data to train a user engagement detector. Our experiments show that HERALD improves annotation efficiency significantly and achieves 86% user disengagement detection accuracy in two dialog corpora. Our implementation is available at https:// github.com/Weixin-Liang/HERALD/.
Towards Emotional Support Dialog Systems
Siyang Liu, Chujie Zheng, Orianna Demasi, Sahand Sabour, Yu Li, Zhou Yu, Yong Jiang, and Minlie Huang
Emotional support is a crucial ability for many conversation scenarios, including social interactions, mental health support, and customer service chats. Following reasonable procedures and using various support skills can help to effectively provide support. However, due to the lack of a well-designed task and corpora of effective emotional support conversations, research on building emotional support into dialog systems remains untouched. In this paper, we define the Emotional Support Conversation (ESC) task and propose an ESC Framework, which is grounded on the Helping Skills Theory (Hill, 2009). We construct an Emotion Support Conversation dataset (ESConv) with rich annotation (especially support strategy) in a help-seeker and supporter mode. To ensure a corpus of high-quality conversations that provide examples of effective emotional support, we take extensive effort to design training tutorials for supporters and several mechanisms for quality control during data collection. Finally, we evaluate state-of-the-art dialog models with respect to the ability to provide emotional support. Our results show the importance of support strategies in providing effective emotional support and the utility of ESConv in training more emotional support systems.
Discovering Dialogue Slots with Weak Supervision
Vojtch Hudeek, Ondej Dusek, and Zhou Yu
Task-oriented dialogue systems typically require manual annotation of dialogue slots in training data, which is costly to obtain. We propose a method that eliminates this requirement: We use weak supervision from existing linguistic annotation models to identify potential slot candidates, then automatically identify domain-relevant slots by using clustering algorithms. Furthermore, we use the resulting slot annotation to train a neural-network-based tagger that is able to perform slot tagging with no human intervention. This tagger is trained solely on the outputs of our method and thus does not rely on any labeled data. Our model demonstrates state-of-the-art performance in slot tagging without labeled training data on four different dialogue domains. Moreover, we find that slot annotations discovered by our model significantly improve the performance of an end-to-end dialogue response generation model, compared to using no slot annotation at all.
The R-U-A-Robot Dataset: Helping Avoid Chatbot Deception by Detecting User Questions About Human or Non-Human Identity
David Gros, Yu Li, and Zhou Yu
Humans are increasingly interacting with machines through language, sometimes in contexts where the user may not know they are talking to a machine (like over the phone or a text chatbot). We aim to understand how system designers and researchers might allow their systems to confirm its non-human identity. We collect over 2,500 phrasings related to the intent of “Are you a robot?”. This is paired with over 2,500 adversarially selected utterances where only confirming the system is non-human would be insufficient or disfluent. We compare classifiers to recognize the intent and discuss the precision/recall and model complexity tradeoffs. Such classifiers could be integrated into dialog systems to avoid undesired deception. We then explore how both a generative research model (Blender) as well as two deployed systems (Amazon Alexa, Google Assistant) handle this intent, finding that systems often fail to confirm their nonhuman identity. Finally, we try to understand what a good response to the intent would be, and conduct a user study to compare the important aspects when responding to this intent.
On the Generation of Medical Dialogs for COVID-19
Meng Zhou, Zechen Li, Bowen Tan, Guangtao Zeng, Wenmian Yang, Xuehai He, Zeqian Ju, Subrato Chakravorty, Shu Chen, Xingyi Yang, Yichen Zhang, Qingyang Wu, Zhou Yu, Kun Xu, Eric Xing, and Pengtao Xie
Under the pandemic of COVID-19, people experiencing COVID19-related symptoms or exposed to risk factors have a pressing need to consult doctors. Due to hospital closure, a lot of consulting services have been moved online. Because of the shortage of medical professionals, many people cannot receive online consultations timely. To address this problem, we aim to develop a medical dialogue system that can provide COVID19-related consultations. We collected two dialogue datasets – CovidDialog – (in English and Chinese respectively) containing conversations between doctors and patients about COVID-19. On these two datasets, we train several dialogue generation models based on Transformer, GPT, and BERT-GPT. Since the two COVID-19 dialogue datasets are small in size, which bear high risk of overfitting, we leverage transfer learning to mitigate data deficiency. Specifically, we take the pretrained models of Transformer, GPT, and BERT-GPT on dialog datasets and other large-scale texts, then finetune them on our CovidDialog datasets. Experiments demonstrate that these approaches are promising in generating meaningful medical dialogues about COVID-19. But more advanced approaches are needed to build a fully useful dialogue system that can offer accurate COVID-related consultations. The data and code are available at https://github.com/UCSD-AI4H/COVID-Dialogue
PRAL: A Tailored Pre-Training Model for Task-Oriented Dialog Generation
Jing Gu, Qingyang Wu, Chongruo Wu, Weiyan Shi, and Zhou Yu
The recent success of large pre-trained language models such as BERT and GPT-2 has suggested the effectiveness of incorporating language priors in down-stream dialog generation tasks. However, the performance of pre-trained models on dialog task is not as optimal as expected. In this paper, we propose a Pre-trained Role Alternating Language model (PRAL), designed specifically for taskoriented conversational systems. We adopt ARDM (Wu et al., 2019) that models two speakers separately. We also design several techniques, such as start position randomization, knowledge distillation and history discount to improve pre-training performance. We introduce a task-oriented dialog pretraining dataset by cleaning 13 existing data sets. We test PRAL on three different downstream tasks. The results show that PRAL performs better or on par with the state-of-the-art methods.
Associate Professor Xi Chen was awarded the 2021 Delbert Ray Fulkerson Prize for his paper “Complexity of Counting CSP with Complex Weights,” published in Journal of the Association for Computing Machinery in 2017. The award recognizes outstanding papers in discrete mathematics. It is presented at each (triennial) International Symposium of the Mathematical Programming Society.