John Hewitt
Email: jh5020@columbia.edu
Learning from, and learning to generate, natural language is one of the core strategies in modern artificial intelligence. Systems built from the tools learned in this class are increasingly deployed in the world. This section (Section 2) provides a generative models-focused introduction to this field of natural language processing, with the goal of understanding and implementing the foundational ideas beneath state-of-the-art systems.
Topics will include: language modeling, neural network design, text tokenization, web-scale text datasets, representation learning, NLP tasks and evaluation, accelerators like GPU and TPUs, pretraining, posttraining, reinforcement learning, and many others.
For this class, it would be useful to be familiar with any of: linear algebra, python programming, probability, differential calculus. We've provided a set of notes for filling some gaps in preparation: Lecture Note 0. See here for a PDF.
Lectures: Tuesdays & Thursdays, 2:40 PM – 3:55 PM
Location: Northwest Corner 501
We'll be using Ed discussion forums, and Gradescope for assignment submission. You should have been added automatically to both. If you just enrolled, ping us to sync the Canvas roster.
Week | Tuesday (Date) | Thursday (Date) | Assignments / Notes |
---|---|---|---|
1 | Introduction, Language Modeling (notes) Tue Sep 2 |
Tokenization (notes) Thu Sep 4 |
A0 out a0.pdf a0.ipynb |
2 | Background Review (notes) Tue Sep 9 |
Representation Learning 1 (Architectures) (notes) Thu Sep 11 |
|
3 | Representation Learning 2 (Learning) (notes) Tue Sep 16 |
Tasks and Evaluation (notes) Thu Sep 18 |
A0 due Tuesday @ 2:30PM |
4 | Parallelization and GPUs Tue Sep 23 |
Exam 1 (on lectures through Sept 18) Thu Sep 25 |
|
5 | Parallelizable Architectures Tue Sep 30 |
Self-Attention and Transformers Thu Oct 2 |
|
6 | Finetuning Tue Oct 7 (John maybe @ COLM) |
Pretraining 1 Thu Oct 9 |
|
7 | Generation Algorithms (notes) Tue Oct 14 |
Posttraining 1: Instruction Following Thu Oct 16 |
|
8 | Posttraining 2: Reinforcement Learning Tue Oct 21 |
Exam 2 Thu Oct 23 |
|
9 | Experimental Design Tue Oct 28 |
Retrieval and Tools Thu Oct 30 |
|
10 | No class (Election Day) Tue Nov 4 |
AI Safety Thu Nov 6 |
|
11 | Bias, Fairness, Privacy Tue Nov 11 |
History of NLP Thu Nov 13 |
|
12 | Guest Lecture 1 Tue Nov 18 (Maybe @ Flatiron) |
Interpretability and Analysis Thu Nov 20 |
|
13 | Guest Lecture 2 Tue Nov 25 |
No class (Thanksgiving) Thu Nov 27 |
|
14 | Looking to the Future Tue Dec 2 |
Final Project Help Thu Dec 4 |
This grading breakdown is provisional and subject to change.
Letter grades will be determined by the teaching staff as a function of the following breakdown; cutoffs for each letter grade will be decided at the end of the class, not by pre-set cutoffs. All written elements of the assignments, as well as the final project writeups, must be written in LaTeX and submitted as PDF.AI tools (e.g., ChatGPT, Cursor, Claude Code) are fully allowed for Assignments 1–4. While I recommend doing the assignments on your own (or with minimal AI hints) as prep for exams, you may use AI to fully solve them if you wish. It is your responsibility to ensure that submitted code and math are correct.
AI tools are also allowed for the final project, both in coding and writing. However, students must take responsibility for all written content and supporting code submitted.
No AI tools are allowed during exams, which will be written in-class.
Names | Day | Time | Location |
---|---|---|---|
Daniel, Nick | Monday | 3:00-5:00PM | CEPSR 620 |
John | Tuesday | 10:00–11:30 AM | CEPSR 724 |
Andrew, Chatiya | Tuesday | 12:30-2:30PM | CSB 488 |
Melody, Noah | Wednesday | 1:00-3:00PM | CSB 488 |
John | Thursday | 10:00–11:30 AM | CEPSR 724 |
There is no attendance policy; attend as you want. though I strongly advise students to attend guest lectures, out of thanks and respect for our guest lecturers.
Please see the grading section for our policies on AI tools in this class. Otherwise, please refer to the Faculty Statement on Academic Integrity and the Columbia University Undergraduate Guide to Academic Integrity.
The teaching team is committed to accomodating students with disabilities in line with the Faculty Statement on Disability Accommodations.