Columbia University NLP Seminar

The seminar is a series of both invited faculty talks and student speakers. All are welcome to attend.

This semester (Fall 2022) the standard time is 4:00-5:00pm ET on Monday. However, we will also have multiple talks scheduled on different days/at different times so please check the days and times carefully on the calendar below so you don't miss out on any talks!

The seminar will be hybrid this semester. In-person meetings will be held in the CS Conference room (unless noted otherwise). The Zoom link will be sent out to the NLP mailing list. If you are not on the mailing list but would like the link, please email us.

The seminar is orgnaized by Emily Allaway. Please contact me with any questions.

Upcoming Talk: Alan Ritter (MON October 17 @ 3:00pm ET)
Towards Economically Efficient Use of Pre-Trained Language Models

Abstract: Large language models are leading to breakthroughs in a variety of applications, from information extraction systems that are accurate and robust, to human-like conversational assistants. In this talk I will present an analysis on when the benefits of pre-training a new model outweigh the computational costs. Conventional wisdom holds that data annotation is expensive, so computational methods that leverage freely available unlabeled data can present an economical alternative when adapting to a new domain. The talk will examine this assumption in the context of pretraining-based domain adaptation, which requires significant GPU/TPU resources for each new domain. We frame domain adaptation as a consumer choice problem: given a fixed budget, what combination of annotation and pre-training lead to maximum utility? In the second part of the talk, I will discuss recent work on in-context learning for anaphora resolution. I will show that resolving anaphora in the wild is a challenging task for in-context learning, then present a new method, MICE (Mixtures of In-Context Experts) and demonstrate how it accurately resolves multiple-antecedent anaphora in scientific protocols. MICE enables accurate few-shot anaphora resolution by ensembling hundreds of prompts that are created from only a handful of training examples. Finally, I will discuss applications of NLP on chemical synthesis protocols and show a demo of a system that can help chemists more efficiently find experimental details described in the literature.

Bio: Alan Ritter is an associate professor in the School of Interactive Computing at Georgia Tech. His research interests include natural language processing, information extraction, and machine learning. He completed his Ph.D. at the University of Washington and was a postdoctoral fellow in the Machine Learning Department at Carnegie Mellon University. His research aims to solve challenging technical problems that can help machines learn to read vast quantities of text with minimal supervision. In a recent project, covered by WIRED (https://www.wired.com/story/machine-learning-tweets-critical-security-flaws/), his group built a system that reads millions of tweets for mentions of new software vulnerabilities. Alan is the recipient of an NSF CAREER award and an Amazon Research Award.

Fall 2022



Spring 2022



Fall 2021



Spring 2021



Fall 2020