Columbia Logo

COMS 4705: Natural Language Processing

Columbia University, Fall 2025, Section 2

John Hewitt
John
Hewitt
Instructor
Nick Deas
Nick
Deas
Staff
Andrew Tang
Andrew
Tang
Staff
Melody Ma
Melody
Ma
Staff
Chaitya Shah
Chaitya
Shah
Staff
Daniel Zhang
Daniel
Zhang
Staff
Noah Foster
Noah
Foster
Staff
Joey Huang
Joey
Huang
Staff

Instructor

John Hewitt
Email: jh5020@columbia.edu

Course Description

Learning from, and learning to generate, natural language is one of the core strategies in modern artificial intelligence. Systems built from the tools learned in this class are increasingly deployed in the world. This section (Section 2) provides a generative models-focused introduction to this field of natural language processing, with the goal of understanding and implementing the foundational ideas beneath state-of-the-art systems.

Topics will include: language modeling, neural network design, text tokenization, web-scale text datasets, representation learning, NLP tasks and evaluation, accelerators like GPU and TPUs, pretraining, posttraining, reinforcement learning, and many others.

Prerequisites

For this class, it would be useful to be familiar with any of: linear algebra, python programming, probability, differential calculus. We've provided a set of notes for filling some gaps in preparation: Lecture Note 0. See here for a PDF.

Schedule

Lectures: Tuesdays & Thursdays, 2:40 PM – 3:55 PM
Location: Northwest Corner 501

We'll be using Ed discussion forums, and Gradescope for assignment submission. You should have been added automatically to both. If you just enrolled, ping us to sync the Canvas roster.

Lectures & Lecture Notes

This schedule is provisional and subject to change.
Background Lecture Notes (PDF)
Week Tuesday (Date) Thursday (Date) Assignments / Notes
1 Introduction, Language Modeling
(notes)
Tue Sep 2
Tokenization
(notes)
Thu Sep 4
A0 out
a0.pdf
a0.ipynb
2 Background Review
(notes)
Tue Sep 9
Representation Learning 1 (Architectures)
(notes)
Thu Sep 11
3 Representation Learning 2 (Learning)
(notes)
Tue Sep 16
Tasks and Evaluation
(notes)
Thu Sep 18
A0 due Tuesday @ 2:30PM

A1 out
a1.ipynb
4 Building a Machine Translation System
Tue Sep 23
Exam 1 (on lectures through Sept 18)
Thu Sep 25
Exam 1 Prep
5 GPUs and Parallelizable Architectures
(PDF notes)
Tue Sep 30
Self-Attention and Transformers
(PDF notes)
(slides)
Thu Oct 2
A1 Due Thursday
6 More Transformers & Pretraining
Tue Oct 7
(PDF notes)
(slides)
Pretraining II
Thu Oct 9
A2 Written out
A2 Code out
7 Generation Algorithms
(notes)
Tue Oct 14
Posttraining 1: Instruction Following
Thu Oct 16
(Tentative) Final Project Proposal out, Oct 16
Proposal Guidelines
Practical Tips

A2 Written Due Fri Oct 17
8 Posttraining 2: Reinforcement Learning
Tue Oct 21
Exam 2
Thu Oct 23
9 Experimental Design
Tue Oct 28
Retrieval and Tools
Thu Oct 30
A2 Code due Tue Oct 28 @ 2:40pm
10 No class (Election Day)
Tue Nov 4
AI Safety
Thu Nov 6
11 Bias, Fairness, Privacy
Tue Nov 11
History of NLP
Thu Nov 13
12 Guest Lecture 1
Tue Nov 18
(Maybe @ Flatiron)
Interpretability and Analysis
Thu Nov 20
13 Guest Lecture 2
Tue Nov 25
No class (Thanksgiving)
Thu Nov 27
14 Looking to the Future
Tue Dec 2
Final Project Help
Thu Dec 4

Grading

This grading breakdown is provisional and subject to change.

Letter grades will be determined by the teaching staff as a function of the following breakdown; cutoffs for each letter grade will be decided at the end of the class, not by pre-set cutoffs. All written elements of the assignments, as well as the final project writeups, must be written in LaTeX and submitted as PDF.

AI Tools Policy

AI tools (e.g., ChatGPT, Cursor, Claude Code) are fully allowed for Assignments 1–4. While I recommend doing the assignments on your own (or with minimal AI hints) as prep for exams, you may use AI to fully solve them if you wish. It is your responsibility to ensure that submitted code and math are correct.

AI tools are also allowed for the final project, both in coding and writing. However, students must take responsibility for all written content and supporting code submitted.

No AI tools are allowed during exams, which will be written in-class.

Office Hours

Names Day Time Location
Daniel, Nick, Joey Monday 3:00-5:00PM CSB 480
Andrew, Chatiya Tuesday 12:30-2:30PM CSB 488
John Wednesday 5:00-6:30PM CEPSR 724
John Thursday 10:00–11:30 AM CEPSR 724
Melody, Noah Friday 2:00-4:00PM CSB 453

Materials and Expectations

This course has no required textbook; we use our own lecture notes, provided here. These lecture notes will be supplemented by optional readings of open-access research papers. As detailed in the grading section, this course will have four assignments, two exams, and a large final project in which students will be expected to propose, execute, and write up a report on a natural language processing project with the help of the teaching staff.

Students will be evaluated and given feedback from there assigned mentor at two intermediate points in the final project process to help ensure expectations are understood. We additionally provide a document containing practical suggestions on designing and carrying out your projects in the Practical Tips For Final Projects Notes Provisional guidelines for each intermediate final project submission and brief descriptions of each are included below:

  1. Final Project Proposal: Students will carry out and reflect on an AI-assisted (chosen LLM and AI2's ScholarQA) literature review on an NLP-topic of their choosing that they may focus on in their final projects. They will also briefly detail the project they plan to carry out, including the primary research question(s), task, data, neural approach, baselines, and evaluation approach.
  2. Final Project Milestone: Students will report their progress on the final project so far, incorporating feedback received on the proposal, including preliminary results generated, and outlining plans for the remainder of the project.
  3. Final Project Report: Students will describe their experiments and report their findings in the style of an NLP/Deep Learning paper incorporating feedback received throughout the semester from Course staff.

There is no attendance policy; attend as you want. though I strongly advise students to attend guest lectures, out of thanks and respect for our guest lecturers.

Please see the grading section for our policies on AI tools in this class. Otherwise, please refer to the Faculty Statement on Academic Integrity and the Columbia University Undergraduate Guide to Academic Integrity.

The teaching team is committed to accomodating students with disabilities in line with the Faculty Statement on Disability Accommodations.