John Hewitt

Assistant Professor of Computer Science, Columbia University.

Visiting Researcher, Google DeepMind.

jh5020 [at] columbia.edu

I am a researcher interested in developing neural language systems, deeply understanding them, and precisely controlling them, for the sake of peoples' access to information and useful tools. Feel free to look me up on Google Scholar or Twitter, or take my CV.

Advising

Tianyi Lorena Yan (PhD student 2025–)

Course Materials

I strive to provide open access to my teaching materials. You can find materials for my new natural language processing course, a version of Columbia Computer Science 4705. Older videos of my Stanford 224n lectures are available on YouTube.

Join my lab @ Columbia

Here's an interest form for Columbia undergrads and masters students.
If you're a current Columbia PhD student, please email me.
If you're applying to PhD programs, apply to the Department of Computer Science and include me in the list of professors you're interested in; I plan to admit roughly two students in the upcoming PhD cycle (for students starting Fall 2026.)

More about me

I did my PhD research at Stanford Computer Science, as part of the NLP group. I'm grateful to have been co-advised by Chris Manning and Percy Liang, and to have been supported by an NSF Graduate Research Fellowship. Before that, I did my undergrad studies at Penn.

Publications

2026

Subliminal Steering: Stronger Encoding of Hidden Signals.
George Morgulis, John Hewitt.
Preprint.
(pdf)

Improving Parametric Knowledge Access in Reasoning Language Models.
Melody Ma, John Hewitt.
Preprint.
(pdf) (blog)

Neologism Learning for Controllability and Self-Verbalization.
John Hewitt, Oyvind Tafjord, Robert Geirhos, Been Kim.
ICLR 2026.
(pdf)

2025

Because we have LLMs, we Can and Should Pursue Agentic Interpretability.
Been Kim, John Hewitt, Neel Nanda, Noah Fiedel, Oyvind Tafjord.
Preprint.
(pdf)

We Can't Understand AI Using our Existing Vocabulary.
John Hewitt, Robert Geirhos, Been Kim.
ICML 2025. (Position Paper)
(pdf)

2024

Instruction Following without Instruction Tuning.
John Hewitt, Nelson F. Liu, Christopher D. Manning, Percy Liang.
Preprint.
(pdf) (code) (blog)

Model Editing with Canonical Examples.
John Hewitt, Sarah Chen, Lanruo Lora Xi, Edward Adams, Percy Liang, Christopher D. Manning.
Preprint.
(pdf) (code)
A non-archival version won Honorable Mention for Best Paper at the R0-FoMo Workshop at NeurIPS 2023.

Closing the Curious Case of Neural Text Degeneration.
Matthew Finlayson, John Hewitt, Alexander Koller, Swabha Swayamdipta, Ashish Sabharwal.
ICLR 2024.
(pdf) (code)

2023

Backpack Language Models.
John Hewitt, John Thickstun, Christopher D. Manning, Percy Liang.
ACL 2023 (long papers). (Outstanding Paper Award).
(pdf) (blog) (code)
(backpackmodels.science)

Character-level Chinese Backpack Language Models.
Hao Sun, John Hewitt.
BlackBoxNLP 2023.
(pdf) (code)

Lost in the Middle: How Language Models Use Long Contexts.
Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, Percy Liang.
TACL 2023.
(pdf) (code)

2022

Truncation Sampling as Language Model Desmoothing.
John Hewitt, Christopher D. Manning, Percy Liang.
Findings of EMNLP 2022 (long papers).
(pdf) (blog) (code)

JamPatoisNLI: A Jamaican Patois Natural Language Inference Dataset.
Ruth-Ann Armstrong, John Hewitt, Christopher D. Manning.
Findings of EMNLP 2022 (long papers).
(pdf) (blog) (talk) (dataset) (code) (Vox video)

2021

Conditional probing: measuring usable information beyond a baseline.
John Hewitt, Kawin Ethayarajh, Percy Liang, Christopher D. Manning.
EMNLP 2021 (short papers).
(pdf) (blog) (code) (codalab)

On the Opportunities and Risks of Foundation Models.
Bommasani et al (+100 authors). John Hewitt, Co-lead; Interpretability section.
whitepaper.
(pdf)

Probing artificial neural networks: Insights from neuroscience.
Anna Ivanova, John Hewitt, Noga Zaslavsky.
Brain2AI 2021.
(pdf)

Refining Targeted Syntactic Evaluation of Language Models.
Benjamin Newman, Kai-Siang Ang, Julia Gong, John Hewitt.
NAACL 2021 (short papers).
(pdf) (code)

2020

RNNs can generate bounded hierarchical languages with optimal memory.
John Hewitt, Michael Hahn, Surya Ganguli, Percy Liang, Christopher D. Manning.
EMNLP 2020 (long papers).
(pdf) (blog) (code:analytic) (code:learning) (codalab)

The EOS Decision and Length Extrapolation.
Benjamin Newman, John Hewitt, Percy Liang, Christopher D. Manning.
BlackBoxNLP 2020. (Outstanding Paper Award).
(pdf) (code)

Emergent Linguistic Structure in Artificial Neural Networks Trained by Self-Supervision.
Christopher D. Manning, Kevin Clark, John Hewitt, Urvashi Khandelwal, Omer Levy.
Proceedings of the National Academy of Sciences. 2020.
(pdf)

Finding Universal Grammatical Relations in Multilingual BERT.
Ethan A. Chi, John Hewitt and Christopher D. Manning.
ACL 2020 (long papers).
(pdf) (bib) (code) (viz)

2019

Designing and Interpreting Probes with Control Tasks.
John Hewitt and Percy Liang.
EMNLP 2019 (long papers). (Runner Up Best Paper Award).
(pdf) (bib) (blog) (code) (codalab) (slides) (talk).

A Structural Probe for Finding Syntax in Word Representations.
John Hewitt and Christopher D. Manning.
NAACL 2019 (short papers).
(pdf) (bib) (blog) (code) (nlp highlights podcast) (slides) (talk).

Simple, Fast, Accurate Intent Classification and Slot Labeling for Goal-Oriented Dialogue Systems.
Arshit Gupta*, John Hewitt* and Katrin Kirchhoff.
SIGDIAL 2019.
(pdf)
*: Equal contribution; authors listed alphabetically

2018

A Distributional and Orthographic Aggregation Model for English Derivational Morphology.
Daniel Deutsch*, John Hewitt* and Dan Roth.
ACL 2018 (long papers).
(pdf)
*: Equal contribution; authors listed alphabetically

Learning Translations via Images with a Massively Multilingual Image Dataset.
John Hewitt*, Daphne Ippolito*, Brendan Callahan, Reno Kriz, Derry Tanti Wijaya and Chris Callison-Burch.
ACL 2018 (long papers).
(pdf)
*: Equal contribution; authors listed alphabetically

XNMT: The eXtensible Neural Machine Translation Toolkit.
Graham Neubig, Matthias Sperber, Xinyi Wang, Matthieu Felix, Austin Matthews, Sarguna Padmanabhan, Ye Qi, Devendra Singh Sachan, Philip Arthur, Pierre Godard, John Hewitt, Rachid Riad, and Liming Wang.
AMTA 2018.
(pdf)

2017

Learning Translations via Matrix Completion.
Derry Tanti Wijaya, Brendan Callahan, John Hewitt, Xiao Ling, Marianna Apidianaki, and Chris Callison-Burch.
EMNLP 2017 (long papers).
(pdf)

2016

Automatic Construction of Morphologically-Motivated Translation Models for Highly Inflected Low-Resource Languages.
John Hewitt, Matt Post, David Yarowsky.
AMTA 2016.
(pdf)

Invited Talks

Interplay [interpretability] research is alignment research with a big bet.
INTERPLAY Workshop @ COLM. October, 2025.

Instruction Following without Instruction Tuning.
Deep Learning: Classics and Trends (ML Collective). November, 2024.

Instruction Following without Instruction Tuning.
Bay Area Language Interest Group (Bayli). November, 2024.

Instruction Following without Instruction Tuning.
University of Washington. November, 2024.

Instruction Following without Instruction Tuning.
University of Pennsylvania. November, 2024.

Understanding Language Models through Discovery and by Design.
UMichigan. March, 2024.

Understanding Language Models through Discovery and by Design.
Northwestern. March, 2024.

Understanding Language Models through Discovery and by Design.
Harvard. February, 2024.

Understanding Language Models through Discovery and by Design.
NYU. February, 2024.

Understanding Language Models through Discovery and by Design.
Columbia. February, 2024.

Backpack Language Models.
Apple. August 7, 2023.

Backpack Language Models.
Princeton NLP. August 4, 2023.

Backpack Language Models.
Columbia NLP. July 19, 2023.

Backpack Language Models.
Cornell Tech NLP. July 18, 2023.

Backpack Language Models.
NYU. July 17, 2023.

Backpack Language Models.
Anthropic. May 10, 2023.

Backpack Language Models.
Schütze Lab, LMU Munich. May 1, 2023.

Backpack Language Models.
Rycolab, ETH Zurich. April 27, 2023.

Surviving Grad School.
ACL Year-Round Mentorship Panel. July 11, 2022.

A Natural Language Processing perspective on supervised analysis of neural representations.
EvLab, MIT. December 2, 2020.

The Unreasonable Syntactic Expressivity of RNNs.
USC ISI NLP Seminar. (video) November 5, 2020.

Language Probes as V-information Estimators.
NLP with Friends. September 9, 2020.

Probing Neural NLP: Ideas and Problems.
Berkeley NLP Seminar. November 18, 2019.

Emergent Linguistic Structure in Neural NLP.
Amazon AI. July 25, 2019.

A Structural Probe for Finding Syntax in Word Representations.
NLP Highlights Podcast. May, 2019.

Abstracts

RNNs can generate bounded hierarchical languages with optimal memory.
John Hewitt, Michael Hahn, Surya Ganguli, Percy Liang, Christopher D. Manning.
2020 Conference on the Mathematical Theory of Deep Learning (abstracts).

Semantic Bootstrapping in Frames: A Computational Model of Syntactic Category Acquisition.
John Hewitt, Jordan Kodner, Mitch Marcus, and Charles Yang.
Conference of the Cognitive Science Society (CogSci), (member posters) 2017. (pdf) (abstract)

Patents

Determining training data sizes for training smaller neural networks using shrinking estimates.
John Hewitt, Adhiguna Kuncoro, Aida Nematzadeh.
US Patent App US18/932,554, priority to EP23206829.6A, October 2023. (application)

Capturing Rich Response Relationships with Small-Data Neural Networks.
John Hewitt.
US Patent App 15/841,963. December 2017. (granted). (application)

Blog

Projects

Self-Attention and Transformers lecture notes

I wrote a lecture on Transformers in my role as Head TA for Stanford's CS 224N: Natural Language Processing with Deep Learning in 2021. The updated slides are available, as is a recording on YouTube. In 2023, I updated the lecture (which had also been updated by Anna Goldie in 2022). Along with the lecture, in 2023 I wrote brand new lecture notes.

Pretraining lecture

I wrote a lecture on Pretraining for the same course! The 2021 version is available on YouTube.

Model analysis and explanation lecture

I wrote a lecture on analysis and explanation of NLP models for the same course! The 2021 version is available on YouTube.