With the increasing applications of language models, it has become crucial to protect these models from leaking private information. Previous work has attempted to tackle this challenge by training RNN-based language models with differential privacy guarantees. However, applying classical differential privacy to language models leads to poor model performance as the underlying privacy notion is over-pessimistic and provides undifferentiated protection for all tokens in the data. Given that the private information in natural language is sparse (for example, the bulk of an email might not carry personally identifiable information), we propose a new privacy notion, selective differential privacy, to provide rigorous privacy guarantees on the sensitive portion of the data to improve model utility. To realize such a new notion, we develop a corresponding privacy mechanism, Selective-DPSGD, for RNN-based language models. Besides language modeling, we also apply the method to a more concrete application–dialog systems. Experiments on both language modeling and dialog system building show that the proposed privacy-preserving mechanism achieves better utilities while remaining safe under various privacy attacks compared to the baselines. The data and code are released at this HTTPS URL to facilitate future research.
Knowledge-grounded dialogue systems are challenging to build due to the lack of training data and heterogeneous knowledge sources. Existing systems perform poorly on unseen topics due to limited topics covered in the training data. In addition, heterogeneous knowledge sources make it challenging for systems to generalize to other tasks because knowledge sources in different knowledge representations require different knowledge encoders. To address these challenges, we present PLUG, a language model that homogenizes different knowledge sources to a unified knowledge representation for knowledge-grounded dialogue generation tasks. PLUG is pre-trained on a dialogue generation task conditioned on a unified essential knowledge representation. It can generalize to different downstream knowledge-grounded dialogue generation tasks with a few training examples. The empirical evaluation on two benchmarks shows that our model generalizes well across different knowledge-grounded tasks. It can achieve comparable performance with state-of-the-art methods under a fully-supervised setting and significantly outperforms other methods in zero-shot and few-shot settings.
As task-oriented dialog systems are becoming increasingly popular in our lives, more realistic tasks have been proposed and explored. However, new practical challenges arise. For instance, current dialog systems cannot effectively handle multiple search results when querying a database, due to the lack of such scenarios in existing public datasets. In this paper, we propose Database Search Result (DSR) Disambiguation, a novel task that focuses on disambiguating database search results, which enhances user experience by allowing them to choose from multiple options instead of just one. To study this task, we augment the popular task-oriented dialog datasets (MultiWOZ and SGD) with turns that resolve ambiguities by (a) synthetically generating turns through a pre-defined grammar, and (b) collecting human paraphrases for a subset. We find that training on our augmented dialog data improves the model’s ability to deal with ambiguous scenarios, without sacrificing performance on unmodified turns. Furthermore, pre-fine tuning and multi-task learning help our model to improve performance on DSRdisambiguation even in the absence of indomain data, suggesting that it can be learned as a universal dialog skill. Our data and code will be made publicly available.
Currently available grammatical error correction (GEC) datasets are compiled using well-formed written text, limiting the applicability of these datasets to other domains such as informal writing and dialog. In this paper, we present a novel parallel GEC dataset drawn from open-domain chatbot conversations; this dataset is, to our knowledge, the first GEC dataset targeted to a conversational setting. To demonstrate the utility of the dataset, we use our annotated data to fine-tune a state-of-the-art GEC model, resulting in a 16-point increase in model precision. This is of particular importance in a GEC model, as model precision is considered more important than recall in GEC tasks since false positives could lead to serious confusion in language learners. We also present a detailed annotation scheme which ranks errors by perceived impact on comprehensibility, making our dataset both reproducible and extensible. Experimental results show the effectiveness of our data in improving GEC model performance in conversational scenarios.
Conversational recommendation systems (CRS) engage with users by inferring user preferences from dialog history, providing accurate recommendations, and generating appropriate responses. Previous CRSs use knowledge graph (KG) based recommendation modules and integrate KG with language models for response generation. Although KG-based approaches prove effective, two issues remain to be solved. First, KG-based approaches ignore the information in the conversational context but only rely on entity relations and bag of words to recommend items. Second, it requires substantial engineering efforts to maintain KGs that model domain-specific relations, thus leading to less flexibility. In this paper, we propose a simple yet effective architecture comprising a pre-trained language model (PLM) and an item metadata encoder. The encoder learns to map item metadata to embeddings that can reflect the semantic information in the dialog context. The PLM then consumes the semantic-aligned item embeddings together with dialog context to generate high-quality recommendations and responses. Instead of modeling entity relations with KGs, our model reduces engineering complexity by directly converting each item to an embedding. Experimental results on the benchmark dataset ReDial show that our model obtains state-of-the-art results on both recommendation and response generation tasks.
Recent large-scale natural language processing (NLP) systems use a pre-trained Large Language Model (LLM) on massive and diverse corpora as a headstart. In practice, the pre-trained model is adapted to a wide array of tasks via fine-tuning on task-specific datasets. LLMs, while effective, have been shown to memorize instances of training data thereby potentially revealing private information processed during pre-training. The potential leakage might further propagate to the downstream tasks for which LLMs are fine-tuned. On the other hand, privacy-preserving algorithms usually involve retraining from scratch, which is prohibitively expensive for LLMs. In this work, we propose a simple, easy to interpret, and computationally lightweight perturbation mechanism to be applied to an already trained model at the decoding stage. Our perturbation mechanism is model-agnostic and can be used in conjunction with any LLM. We provide a theoretical analysis showing that the proposed mechanism is differentially private, and experimental results show a privacy-utility trade-off.
The annual conference of the North American Chapter of the Association for Computational Linguistics (NAACL) is the preeminent event in the field of natural language processing. CS researchers in professor Julia Hirschberg’s group won a Best Paper award for a novel resource, SpatialNet, which provides a formal representation of how a language expresses spatial relations. Other accepted papers are detailed below.
The researchers identified and analyzed unique linguistic characteristics of Reddit posts written by users who claim to have received a diagnosis for schizophrenia. The findings were interpreted in the context of established schizophrenia symptoms and compared with results from previous research that has looked at schizophrenia and language on social media platforms.
The results showed several differences in language usage between users with schizophrenia and a control group. For example, people with schizophrenia used less punctuation in their Reddit posts. Disorganized language use is a prominent and common symptom of schizophrenia.
A machine learning classifier was trained to automatically identify self-identified users with schizophrenia on Reddit, using linguistic cues.
“We hope that this work contributes toward the ultimate goal of identifying high risk individuals,” said Sara Ita Levitan, a postdoctoral research scientist with the Spoken Language Processing Group. “Early intervention and diagnosis is important to improve overall treatment outcomes for schizophrenia.”
For many people, social media is a primary source of information and it can become a key venue for opinionated discussion. In order to evaluate and analyze these discussions, it is important to understand contrast or a difference in opinions.
As a step towards a better understanding of arguments, the researchers developed a method to automatically generate responses to internet comments containing differences in stance. They created a corpus from over one million contrastive claims mined from the social media site Reddit. In order to obtain training data for the models, they extracted pairs of comments containing the acronym FTFY (“fixed that for you”).
For example, in a discussion over who should be the next President of the United States, one participant might state “Bernie Sanders for president” and another might state “Hillary Clinton for president. FTFY”
A neural network model was trained on the pairs to edit the original claim and produce a new claim with a different view.
Claim : Bernie Sanders for president New claim : Hillary Clinton for president.
“One aspect of this problem that was surprising was that the standard ‘sequence-to-sequence with attention’ baseline performed poorly, often just copying the output or selecting generic responses,” said Christopher Hidey, a fourth year PhD student. While generic response generation is a known problem in neural models, their custom model significantly outperformed this baseline in several metrics including novelty and overlap with human-generated responses.
The researchers developed an automatic summarization system that specializes in producing English summaries for documents originally written in three low-resource languages – Somali, Swahili, and Tagalog.
There is little natural language processing work done in low-resource languages and machine translation systems for those languages are of lower quality than those for high-resource languages like French or German.
As a result, the translations are often disfluent and contain errors that make them difficult for a human to understand, much less for a summarization system to process.
An example of machine-translated document originally written in Swahili : Mange Kimambi ‘I pray for the parliamentary seat for Kinondoni constituency for ticket of CCM. Not special seats’ Kinondoni without drugs is possible I pray for the parliamentary seat for Kinondoni constituency on the ticket of CCM. Yes, it’s not a special seats, Khuini Kinondoni, what will I do for Kinondoni? Tension is many I get but we must remember no good that is available easily. Kinondoni without drugs is possible. As a friend, fan or patriotism I urge you to grant your contribution to the situation and propert. You can use Western Union or money to go to Mange John Kimambi. Account of CRDB Bank is on blog. Reduce my profile in my blog understand why I have decided to vie for Kinondoni constituency. you will understand more.
A standard summarization system’s
output on the document : Mange Kimambi, who pray for parliamentary seat for Kinondoni constituency for ticket of CCM, is on blog, and not special seats’ Kinondoni without drugs.
The robust summarization system’s output
on the document : Mange Kimambi, who pray for parliamentary seat for Kinondoni constituency for ticket of CCM, comments on his plans to vie for ‘Kinondoni’ without drugs.
“We addressed this challenge by creating large collections of synthetic, errorful “translations” that mimic the output of low-quality machine translations,” said Jessica Ouyang, a seventh year PhD student. They paired the problematic text with high-quality, human-written summaries. The experiment showed that a neural network summarizer trained on this synthetic data was able to correct or elide translation errors and produce fluent English summaries. The error-correcting ability of the system extends to Arabic, a new language previously unseen by the system.
Argument mining, or argumentation mining, is a research area within the natural language processing field. Argument mining is applied in many different genres including the qualitative assessment of social media content (e.g. Twitter, Facebook) – where it provides a powerful tool for policy-makers and researchers in social and political sciences – legal documents, product reviews, scientific articles, online debates, newspaper articles, and dialogical domains. One of the main tasks of argument mining is to detect a claim.
Claims are the central component of an argument. Detecting claims across different domains or data sets can often be challenging due to their varying conceptualization. The researchers set out to alleviate this problem by fine-tuning a language model. They created a corpus mined from Reddit that is composed of 5.5 million opinionated claims. These claims are self-labeled by their authors using the internet acronyms IMO/IMHO or “In My Humble Opinion”.
By fine-tuning the language on the IMHO dataset they were able to obtain a significant improvement on claim detection of the datasets. As these data sets include diverse domains such as social media and student essays, this improvement demonstrates the robustness of fine-tuning on this novel corpus.
Community Question Answering forums such as Yahoo! Answers and Quora are popular nowadays, as they represent effective means for communities to share information around particular topics. But the information often shared on these forums may be incorrect or misleading.
The paper presents the ColumbiaNLP submission for the SemEval-2019 Task 8: Fact-Checking in Community Question Answering Forums. The researchers show how fine-tuning a language model on a large unannotated corpus of old threads from the Qatar Living forum helps to classify question types (factual, opinion, socializing) and to judge the factuality of answers on the shared task labeled data from the same forum. Their system finished 4th and 2nd on Subtask A (question type classification) and B (answer factuality prediction), respectively, based on the official metric of accuracy.
Question classification Factual : The question is asking for factual information, which can be answered by checking various information sources, and it is not ambiguous. e.g. “What is Ooredoo customer service number?” Opinion : The question asks for an opinion or advice, not for a fact. e.g. “Can anyone recommend a good Vet in Doha?” Socializing : Not a real question, but intended for socializing or for chatting. This can also mean expressing an opinion or sharing some information, without really asking anything of general interest. e.g. “What was your first car?”
Answer classification Factual – TRUE : The answer is True and can be proven with an external resource. Q : “I wanted to know if there were any specific shots and vaccinations I should get before coming over [to Doha].” A : “Yes there are; though it varies depending on which country you come from. In the UK; the doctor has a list of all countries and the vaccinations needed for each.” Factual – FALSE : The answer gives a factual response, but it is False, it is partially false or the responder is unsure about Q : “Can I bring my pitbulls to Qatar?” A : “Yes, you can bring it but be careful this kind of dog is very dangerous.” Non-Factual : When the answer does not provide factual information to the question; it can be an opinion or an advice that cannot be verified e.g. “It’s better to buy a new one.”
“We show that fine-tuning a language model on a large unsupervised corpus from the same community forum helps us achieve better accuracy for question classification,” said Tuhin Chakrabarty, lead researcher of the paper. Most community question-answering forums have such unlabeled data, which can be used in the absence of large labeled training data.
For answer classification they show how to leverage information from previously answered questions on the thread through language model fine-tuning. Their experiments also show that modeling an answer individually is not the best idea for fact-verification and results are improved when considering the context of the question.
“Determining factuality of answers requires modeling of world knowledge or external evidence – the questions asked are often very noisy and require reformulation,” shared Chakrabarty. “As a future step we would want to incorporate external evidence from the internet in the factual answer classification problem.”
The paper studied dialogue act classification in therapy transcripts. Dialogue act classification is a task in which the researchers attempt to determine the intention of the speaker at each point in a dialogue, classifying it into one of a fixed number of possible types.
This provides a layer of abstraction away from what the speaker is literally saying, giving a higher-level view of the conversation. Ultimately, they hope this work can help analyze the dynamics of text-based therapy on a large scale.
Transcripts of therapy sessions were examined, focusing on the speech of the therapist, using a classification scheme developed for this purpose. On a sentence-by-sentence basis, they found which of the labels best matches the conversational “action” the sentence takes.
For example, if a therapist makes the statement, “It almost feels like if you could do something, anything would be better than…” This would be classified into the Reflection category, as it is rephrasing or restating the experience the client just described, but in a way that makes what they are feeling more explicit.
“One of the interesting results from this research came when we analyzed the performance of our best classifier across different styles of therapy,” said Fei-Tzin Lee, a third year PhD student. Certain styles were markedly easier to classify than others; this was not simply a case where the classifier performed better on therapeutic styles for which there was more data.
Generally, it seemed that therapy styles involving more complex sentence structure were more difficult to classify, although to fully understand the differences between styles further work would be necessary. Continued Lee, “Regardless of the reason, it was interesting to note that there are marked differences that are quantitatively measurable between different styles.”
Dean Boyce's statement on amicus brief filed by President Bollinger
President Bollinger announced that Columbia University along with many other academic institutions (sixteen, including all Ivy League universities) filed an amicus brief in the U.S. District Court for the Eastern District of New York challenging the Executive Order regarding immigrants from seven designated countries and refugees. Among other things, the brief asserts that “safety and security concerns can be addressed in a manner that is consistent with the values America has always stood for, including the free flow of ideas and people across borders and the welcoming of immigrants to our universities.”
This recent action provides a moment for us to collectively reflect on our community within Columbia Engineering and the importance of our commitment to maintaining an open and welcoming community for all students, faculty, researchers and administrative staff. As a School of Engineering and Applied Science, we are fortunate to attract students and faculty from diverse backgrounds, from across the country, and from around the world. It is a great benefit to be able to gather engineers and scientists of so many different perspectives and talents – all with a commitment to learning, a focus on pushing the frontiers of knowledge and discovery, and with a passion for translating our work to impact humanity.
I am proud of our community, and wish to take this opportunity to reinforce our collective commitment to maintaining an open and collegial environment. We are fortunate to have the privilege to learn from one another, and to study, work, and live together in such a dynamic and vibrant place as Columbia.
Sincerely,
Mary C. Boyce
Dean of Engineering
Morris A. and Alma Schapiro Professor