COMS 4705: Natural Language Processing

Course Description

Learning from, and learning to generate, natural language is one of the core strategies in modern artificial intelligence. Systems built from the tools learned in this class are increasingly deployed in the world. This section (Section 2) provides a generative models-focused introduction to this field of natural language processing, with the goal of understanding and implementing the foundational ideas beneath state-of-the-art systems.

Topics will include: language modeling, neural network design, text tokenization, web-scale text datasets, representation learning, NLP tasks and evaluation, accelerators like GPU and TPUs, pretraining, posttraining, reinforcement learning, and many others.

Lectures & Lecture Notes

This schedule is provisional and subject to change.

Background Lecture Notes (PDF)
Week	Tuesday (Date)	Thursday (Date)	Assignments / Notes
1	Introduction, Language Modeling (notes) Tue Sep 2	Tokenization (notes) Thu Sep 4	A0 out a0.pdf a0.ipynb
2	Background Review (notes) Tue Sep 9	Representation Learning 1 (Architectures) (notes) Thu Sep 11
3	Representation Learning 2 (Learning) (notes) Tue Sep 16	Tasks and Evaluation (notes) Thu Sep 18	A0 due Tuesday @ 2:30PM A1 out a1.ipynb
4	Building a Machine Translation System (PDF notes) Tue Sep 23	Exam 1 (on lectures through Sept 18) Thu Sep 25	Exam 1 Prep
5	GPUs and Parallelizable Architectures (PDF notes) Tue Sep 30	Self-Attention and Transformers (PDF notes) (slides) Thu Oct 2	A1 Due Thursday
6	More Transformers & Pretraining Tue Oct 7 (PDF notes) (slides)	Pretraining II Thu Oct 9 (PDF notes)	A2 Written out A2 Code out
7	LM Use and Adaptation (finetuning notes) (posttraining notes I) Tue Oct 14	Posttraining 1: Instruction Following (generation notes) (finetuning notes) (posttraining notes I) Thu Oct 16	Final Project Proposal out, Oct 16 Proposal Guidelines Practical Tips A2 Written Due Fri Oct 17
8	Posttraining 2: Reinforcement Learning Tue Oct 21	Exam 2 Thu Oct 23
9	Experimental Design Tue Oct 28	Retrieval and Tools Thu Oct 30	A2 Code due Tue Oct 28 @ 2:40pm Final project Proposal due Thu Oct 30 @ midnight Final Project Milestone assigned, Thu Oct 30 Milestone Guidelines
10	No class (Election Day) Tue Nov 4	AI Safety Thu Nov 6
11	Bias, Fairness, Privacy Tue Nov 11	History of NLP Thu Nov 13
12	Diffusion Models Tue Nov 18	Interpretability and Analysis Thu Nov 20
13	Guest Lecture Pratyusha Sharma, NYU Tue Nov 25	No class (Thanksgiving) Thu Nov 27
14	Looking to the Future Tue Dec 2	Final Project Help Thu Dec 4

Grading

This grading breakdown is provisional and subject to change.

Letter grades will be determined by the teaching staff as a function of the following breakdown; cutoffs for each letter grade will be decided at the end of the class, not by pre-set cutoffs. All written elements of the assignments, as well as the final project writeups, must be written in LaTeX and submitted as PDF.

Exam 1: 15%
Exam 2: 25%
Assignment 0: 1.25%
Assignment 1: 1.25%
Assignment 2: 1.25%
Assignment 3: 1.25%
Final Project: 55%
- Proposal: 5%
- Mid-project review: 15%
- Final report: 35%

AI Tools Policy

AI tools (e.g., ChatGPT, Cursor, Claude Code) are fully allowed for Assignments 1–4. While I recommend doing the assignments on your own (or with minimal AI hints) as prep for exams, you may use AI to fully solve them if you wish. It is your responsibility to ensure that submitted code and math are correct.

AI tools are also allowed for the final project, both in coding and writing. However, students must take responsibility for all written content and supporting code submitted.

No AI tools are allowed during exams, which will be written in-class.

Names	Day	Time	Location
Daniel, Nick, Joey	Monday	3:00-5:00PM	CSB 480
Andrew, Chatiya	Tuesday	12:30-2:30PM	CSB 488
John	Wednesday	5:00-6:30PM	CEPSR 724
John	Thursday	10:00–11:30 AM	CEPSR 724
Melody, Noah	Friday	11:00AM-1:00PM	CSB 5th floor meeting area

Materials and Expectations

This course has no required textbook; we use our own lecture notes, provided here. These lecture notes will be supplemented by optional readings of open-access research papers. As detailed in the grading section, this course will have four assignments, two exams, and a large final project in which students will be expected to propose, execute, and write up a report on a natural language processing project with the help of the teaching staff.

Students will be evaluated and given feedback from there assigned mentor at two intermediate points in the final project process to help ensure expectations are understood. We additionally provide a document containing practical suggestions on designing and carrying out your projects in the Practical Tips For Final Projects Notes Provisional guidelines for each intermediate final project submission and brief descriptions of each are included below:

Final Project Proposal: Students will carry out and reflect on an AI-assisted (chosen LLM and AI2's ScholarQA) literature review on an NLP-topic of their choosing that they may focus on in their final projects. They will also briefly detail the project they plan to carry out, including the primary research question(s), task, data, neural approach, baselines, and evaluation approach.
Final Project Milestone: Students will report their progress on the final project so far, incorporating feedback received on the proposal, including preliminary results generated, and outlining plans for the remainder of the project.
Final Project Report: Students will describe their experiments and report their findings in the style of an NLP/Deep Learning paper incorporating feedback received throughout the semester from Course staff.

There is no attendance policy; attend as you want. though I strongly advise students to attend guest lectures, out of thanks and respect for our guest lecturers.

Please see the grading section for our policies on AI tools in this class. Otherwise, please refer to the Faculty Statement on Academic Integrity and the Columbia University Undergraduate Guide to Academic Integrity.

The teaching team is committed to accomodating students with disabilities in line with the Faculty Statement on Disability Accommodations.

COMS 4705: Natural Language Processing

Columbia University, Fall 2025, Section 2

Instructor

Course Description

Prerequisites

Schedule

Lectures & Lecture Notes

Grading

AI Tools Policy

Office Hours

Materials and Expectations