Instructor: John Hewitt — Columbia University — Department of Computer Science
The use of large language models pervades artificial intelligence research and, increasingly, our world. Understanding these systems should be expected as it is of any other technology. Curiously, as of now, the best way to align systems uses precious little of our work towards understanding their internal workings. This PhD seminar in natural language processing is intended to bring together students in the shared goal of synthesizing, critiquing, and extending recent (and not so recent) research in these distinct subfields of interpretability and alignment.
The first three weeks will consist of an intensive review of the mathematics and technical aspects of large language models—their architecture, pretraining, and alignment—as well as attempts to understand them. Then, students will present research papers to the rest of the class, which will jointly assess, critique, and extend those papers.
There are no formal prerequisites, though having taken COMS 4705 (Natural Language Processing) would be useful. I am admitting PhD students primarily, though I will send out an interest form to the rest of the wait list to gauge research interest for the rest of the slots in the class.
Lectures: Fridays, 1:10–2:00 PM
Location: TBD
We'll use Ed for discussion forums and Gradescope for assignment submission. You should have been added automatically to both. If you just enrolled, ping us to sync the Canvas roster.
| Date | Lecture | Notes |
|---|---|---|
| Jan 23 | Foundations I: Transformer LMs | |
| Jan 30 | Foundations II: Interpretability | Last day to add courses |
| Feb 6 | Foundations III: Alignment | |
| Feb 13 | Student Presentations | |
| Feb 20 | Student Presentations | |
| Feb 27 | Student Presentations | |
| Mar 6 | Student Presentations | |
| Mar 13 | Student Presentations | |
| Mar 20 | No class | Spring Recess |
| Mar 27 | Student Presentations | |
| Apr 3 | Student Presentations | |
| Apr 10 | Student Presentations | |
| Apr 17 | Student Presentations | |
| Apr 24 | Student Presentations | |
| May 1 | Wrap-up |
Each session features two paper presentations. For each paper, three students take on distinct roles:
| Role | Responsibility |
|---|---|
| Background Investigator | Provides context on the paper's motivation, related work, and historical significance |
| Interpretability Researcher | Analyzes the paper through the lens of furthering our understanding of model internals |
| Alignment Researcher | Evaluates the paper's implications for AI safety and alignment |
With 20 papers across 10 sessions and 3 roles per paper, each student will present exactly twice during the semester.
This grading breakdown is provisional and subject to change.
Letter grades will be determined by the teaching staff as a function of the following breakdown; cutoffs for each letter grade will be decided at the end of the class, not by pre-set cutoffs. All written elements must be written in LaTeX and submitted as PDF.
| Component | Weight |
|---|---|
| Research Paper Presentation | 50% |
| Research Review | 50% |
Students are allowed to use AI tools in whatever capacity they desire. The content students submit is their responsibility alone.
This course has no required textbook; I will provide lecture notes for the fundamentals, and then we will be reading and presenting papers.
Attendance is required. In general, I expect you to be at effectively every lecture. However, I dislike grading on attendance, so there's no penalty for not attending, and I understand that everyone will need to miss a lecture or two.
Please see the grading section for our policies on AI tools in this class. Otherwise, please refer to the Faculty Statement on Academic Integrity and the Columbia University Undergraduate Guide to Academic Integrity.
The teaching team is committed to accommodating students with disabilities in line with the Faculty Statement on Disability Accommodations.