The seminar is a series of both invited faculty talks and student speakers. All are welcome to attend.
This semester (Spring 2022) the standard time is 4:00-5:00pm ET on Monday. However, we will also have multiple talks scheduled on different days/at different times so please check the days and times carefully on the calendar below so you don't miss out on any talks!
The seminar will be hybrid this semester. In-person meetings will be held in the CS Conference room (unless noted otherwise). The Zoom link will be sent out to the NLP mailing list. If you are not on the mailing list but would like the link, please email us.
The seminar is co-orgnaized by Emily Allaway and Fei-Tzin Lee. Please contact us with any questions.
Abstract: Large language models have a substantial capacity for high-level analogical reasoning: reproducing patterns in text that occur in their training data or in a provided context. This has led to the widespread assumption that LLMs have absorbed into their parameters the capacity for abstract "reasoning", if not through the proximal next-token prediction that they are trained to perform, then at least through chains of tokens that intervene between a question and answer. In this talk, I will present evidence that this belief is limiting, but that understanding where it fails can inspire efficient and interpretable algorithms that more closely resemble human thinking. In particular, while LLMs are unable to solve tasks that require reasoning over several objects or hypotheses, these tasks become easy when probabilistic inference -- such as clustering, marginalization, and fitting of simple latent variable models -- is performed over the strings or probabilities output by parallel calls to a LLM. Approaches that perform such inference achieve near-human-level performance on a number of benchmarks, including for logical deduction, forming semantic associations, and hallucination detection, where prompting tricks for LLMs fail. These findings point to the flaws of entrusting all reasoning to the mechanism of self-attention in linear text. While large LLMs may compress more and more difficult functions that require retrieval of associations and synthesis of solutions, there is always the next frontier of problems where a trained model with remarkable "intuition" needs to be slowed down; I will argue that more is possible with LLMs of existing scale when they are used in concert with a wise controller that allows for probabilistic inference. (Based on a paper at ACL 2022 and ongoing work.)
Bio: Nikolay Malkin is a postdoctoral researcher at Mila and Université de Montréal, in Prof. Yoshua Bengio's group. His research interests include probabilistic inference over structured objects, induction of compositional structure in deep generative models, and applications to vision and language. Before joining Mila, he received his Ph.D. (in pure mathematics) from Yale University in 2021.