Research from the department was presented at the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL 2021). The conference is a forum for academic and industry researchers to discuss their work on discourse and dialogue including discourse processing, dialogue systems, corpora, tools, and methodology.
Professor Julia Hirschberg was one of the invited keynote speakers and during her lecture she talked about how computer systems can encourage user trust for recommender systems, knowledge-delivery systems, and dialogue systems.
Below are the links to the accepted papers and abstracts.
What to Fact-Check: Guiding Check-Worthy Information Detection in News Articles Through Argumentative Discourse Structure
Tariq Alhindi Columbia University, Brennan McManus Columbia University, and Smaranda Muresan Columbia University
Most existing methods for automatic fact checking start with a precompiled list of claims to verify. We investigate the understudied problem of determining what statements in news articles are worthy to factcheck. We annotate the argument structure of 95 news articles in the climate change domain that are fact-checked by climate scientists at climatefeedback.org. We release the first multi-layer annotated corpus for both argumentative discourse structure (argument components and relations) and for fact checked statements in news articles. We discuss the connection between argument structure and check-worthy statements and develop several baseline models for detecting checkworthy statements in the climate change domain. Our preliminary results show that using information about argumentative discourse structure shows slight but statistically significant improvement over a baseline of local discourse structure.
Improving Named Entity Recognition in Spoken Dialog Systems By Context and Speech Pattern Modeling
Minh Nguyen University of California, Davis and Zhou Yu Columbia University
While named entity recognition (NER) from speech has been around as long as NER from written text has, the accuracy of NER from speech has generally been much lower than that of NER from text. The rise in popularity of spoken dialog systems such as Siri or Alexa highlights the need for more accurate NER from speech because NER is a core component for understanding what users said in dialogs. Deployed spoken dialog systems receive user input in the form of automatic speech recognition (ASR) transcripts, and simply applying NER model trained on written text to ASR transcripts often leads to low accuracy because compared to written text, ASR transcripts lack important cues such as punctuation and capitalization. Besides, errors in ASR transcripts also make NER from speech challenging. We propose two models that exploit dialog context and speech pattern clues to extract named entities more accurately from open-domain dialogs in spoken dialog systems. Our results show the benefit of modeling dialog context and speech patterns in two settings: a standard setting with random partition of data and a more realistic but also more difficult setting where many named entities encountered during deployment are unseen during training.
Evaluation of In-Person Counseling Strategies to Develop Physical Activity Chatbot for Women
Kai-Hui Liang Columbia University, Patrick Lange University of California, Davis, Yoo Jung Oh University of California, Davis, Jingwen Zhang University of California, Davis, Yoshimi Fukuoka University of California, San Francisco, and Zhou Yu Columbia University
Artificial intelligence chatbots are the vanguard in technology-based intervention to change people’s behavior. To develop intervention chatbots, the first step is to understand natural language conversation strategies in human conversation. This work introduces an intervention conversation dataset collected from a real-world physical activity intervention program for women. We designed comprehensive annotation schemes in four dimensions (domain, strategy, social exchange, and taskfocused exchange) and annotated a subset of dialogs. We built a strategy classifier with context information to detect strategies from both trainers and participants based on the annotation. To understand how human intervention induces effective behavior changes, we analyzed the relationships between the intervention strategies and the participants’ changes in the barrier and social support for physical activity. We also analyzed how the participant’s baseline weight correlates to the amount of occurrence of the corresponding strategy. This work lays the foundation for developing a personalized physical activity intervention bot.
DialogStitch: Synthetic Deeper and Multi-Context Task-Oriented Dialogs
Satwik Kottur Facebook AI, Chinnadhurai Sankar Facebook AI, Zhou Yu Columbia University, and Alborz Geramifard Facebook AI
Real-world conversational agents must effectively handle long conversations that span multiple contexts. Such context can be interspersed with chitchat (dialog turns not directly related to the task at hand), and potentially grounded in a multimodal setting. While prior work focused on the above aspects in isolation, there is a lack of a unified framework that studies them together. To overcome this, we propose DialogStitch, a novel framework to seamlessly ‘stitch’ multiple conversations and highlight these desirable traits in a task-oriented dialog. After stitching, our dialogs are provably deeper, contain longer-term dependencies, and span multiple contexts, when compared with the source dialogs— all by leveraging existing human annotations! Though our framework generalizes to a variety of combinations, we demonstrate its benefits in two settings: (a) multimodal, image-grounded conversations, and, (b) task-oriented dialogs fused with chit-chat conversations. We benchmark state-of-the-art dialog models on our datasets and find accuracy drops of (a) 12% and (b) 45% respectively, indicating the additional challenges in the stitched dialogs. Our code and data are publicly available.
Annotation Inconsistency and Entity Bias in MultiWOZ
Kun Qian Columbia University, Ahmad Beirami Facebook AI, Zhouhan Lin Facebook AI, Ankita De Facebook AI, Alborz Geramifard Facebook AI, Zhou Yu Columbia University, and Chinnadhurai Sankar Facebook AI
MultiWOZ (Budzianowski et al., 2018) is one of the most popular multi-domain taskoriented dialog datasets, containing 10K+ annotated dialogs covering eight domains. It has been widely accepted as a benchmark for various dialog tasks, e.g., dialog state tracking (DST), natural language generation (NLG) and end-to-end (E2E) dialog modeling. In this work, we identify an overlooked issue with dialog state annotation inconsistencies in the dataset, where a slot type is tagged inconsistently across similar dialogs leading to confusion for DST modeling. We propose an automated correction for this issue, which is present in 70% of the dialogs. Additionally, we notice that there is significant entity bias in the dataset (e.g., “cambridge” appears in 50% of the destination cities in the train domain). The entity bias can potentially lead to named entity memorization in generative models, which may go unnoticed as the test set suffers from a similar entity bias as well. We release a new test set with all entities replaced with unseen entities. Finally, we benchmark joint goal accuracy (JGA) of the state-of-the art DST baselines on these modified versions of the data. Our experiments show that the annotation inconsistency corrections lead to 7- 10% improvement in JGA. On the other hand, we observe a 29% drop in JGA when models are evaluated on the new test set with unseen entities.