Blue Computer Science "CS@CU" logo with Columbia crown

Information about COMS Courses

Important Note for Non-CS/CE Students Regarding Registration: Although the Computer Science department would like to make CS accessible to the broader student population, our course registration priority is our declared CS students. We will open our select COMS courses to students in other departments during the Change of Program period.

Full information on Course Registration can be found here.

Topics Course Descriptions

This page contains the descriptions for the COMS 4995 and COMS 6998 TOPICS IN COMPUTER SCIENCE courses that have been taught. Please note that these change from semester to semester and may not be offered again.

COMS 4995: Mathematics of Machine Learning and Signal Recognition | Homayoon Beigi

Prerequisites: Advanced Calculus and Linear Algebra. Knowledge of Differential Equations would be helpful.

Mathematics of Machine Learning and Signal Recognition provides the mathematical background for addressing in-depth problems in machine learning, as well as the treatment of signals, especially time-dependent signals, specifically non-stationary time-dependent signals – although spatial signals such as images are also considered. The course will provides the essentials of several mathematical disciplines which are used in the formulation and solution of the problems in the above fields. These disciplines include Linear Algebra and Numerical Methods, Complex Variable Theory, Measure and
Probability Theory (as well as statistics), Information Theory, Metrics and Divergences, Linear Ordinary and Separable Partial Differential Equations of Interest, Integral Transforms, Decision Theory, Transformations, Nonlinear Optimization Theory, and Neural Network Learning Theory. There will be in-depth coverage of many Neural Network Architectures, with in-depth coverage of CNN, TDNN, RNN/LSTM, Transformer, and Conformer architectures. w4995_beigi_mathematics_of_machine_learning_and_signal_recognition_description.

COMS 6998: Hyperscale Infrastructure | Jason Neih

Prerequisites:  W4118

Study of hyperscale computing infrastructures and technologies used to support billions of users based on case studies and research papers from AWS, Google, Meta, Microsoft, Netflix, and others. A course project is required.

COMS 6998: Introduction to Deep Learning and LLM based Generative AI Systems | Parijat Dube and Chen Wang

Prerequisites: An introductory graduate level machine learning course. Working knowledge of Python, Pytorch and experience using Jupyter Notebook.

This course serves as a graduate-level introduction to Deep Learning systems, with an emphasis
on LLM based Generative AI systems. The course will cover several topics related to Deep
Learning (DL) systems and their performance. Both algorithmic and system related building
blocks of DL systems will be covered including DL training algorithms, network architectures, and
best practices for performance optimization. The latter half of the course will have an in-depth
exploration of Large Language Models (LLMs), covering key areas towards advanced topics

including attention mechanisms, transformer models, prompt engineering, LLM applications, pre-
training strategies, Reinforcement Learning with Human Feedback (RLHF), efficient LLM serving

techniques, fine-tuning methods, and benchmarking specifically for LLMs. The students will gain
practical experience working on different stages of LLM life cycle, including model pretraining,
fine tuning, and deployment. The assignments will be mostly hands-on involving standard DL and
LLM frameworks (Pytorch, vLLM) and open-source technologies. COMS 6998 015.


COMS W4995 Advanced Algorithms | Alexandr Andoni

The class covers classic and modern algorithmic ideas that are central to many areas of Computer Science. The focus is on most powerful paradigms and techniques of how to design algorithms, and measure their efficiency. The topics will include hashing, sketching, dimension reduction, spectral graph theory, optimization (linear programming, gradient descent, IPM), multiplicative weights, compressed sensing, and others.

The class is designed as a “grad intro to algorithms” class, and is thus an advanced version of “Analysis of Algorithms” (COMS 4231), both in terms of content as well as pace. You need not have taken 4231, but some algorithmic exposure is expected (see prerequisites below). Hence it is suitable for those of you who have seen some algorithms class (like 4231 or easier), and/or want to take an in-depth algorithms class. The evaluation is based on homeworks and a final project.”


COMS 4995 Advanced Systems Programming | Hans Montero & Jae Woo Lee

Prerequisites: COMS 3157, CSEE 3827, COMS 3134 (or equivalents).

This course focuses on advanced systems programming, covering modern interfaces, libraries, and services used in today’s UNIX-based operating systems. Students will gain an understanding of how they are implemented and learn how to use them.



This course will provide visibility into the journey of building a venture-backed high-growth start-up (from inception to Series B).

First-time founders make a number of mistakes that are easily avoidable. A typical mistake is to make assumptions about the market and market needs. The technology needs to work but a lot more is required to get a start-up off the ground — fundraising, sales, marketing, hiring, operations, and finance.

This course will provide recipes for avoiding failure and answer common questions such as — how to find and validate an idea, how to find a co-founder, how to find investors and raise funding, how to run board meetings, how to build a team (who and how to hire at what stage), how to compensate early employees and advisors, what’s a cap table and how to manage it, etc.

There will be a number of guest speakers ranging from successful CEOs to top venture capitalists. In addition to providing different perspectives, this will help students expand their professional networks.


Upon successful completion of the course, you will have mastered the fundamental knowledge required to succeed in:

  • Competing with confidence in ICPC contest series: Columbia Locals, Greater New York Regionals, North America Championships, and World Finals. Please check the Columbia team site here.
  • Passing with confidence in any entry-level technical interview at the top-tier companies.
  • Attaining high ratings in any competitive programming websites (e.g., CodeForces, AtCoder, LeetCode, and etc.)

We will go over topics and strategies required to succeed in the competitive programming contests!

COMS 4995 Data-Driven Design for Social Innovation | Nakul Verma

Professor Nakul Verma is offering a new interdisciplinary project-based COMS 4995 course in Fall 2023 “Data Driven Design for Social Innovation”.

Course Description
Cross-disciplinary project-based course where teams of students from various departments work together on creating products or services to address important social problems. We will introduce and study a diverse set of ethnographic, psychological, and data science tools to help students generate customer insights, and build and test prototypes for a large-scale social challenge. Student projects will be closely guided by the instructors and domain experts The final deliverables will be shared with the domain expert to determine potential application of the concepts beyond the course. This could include advancing ideas through the formation of a startup, incorporating ideas into the domain expert’s work, or establishing partnerships with organizations working on similar challenges.
Course website:

COMS 4995 DEEP LRNG FOR COMP VISION | Belhumeur, Peter

Recent advances in Deep Learning have propelled Computer Vision forward. Applications such as image recognition and search, unconstrained face recognition, and image and video captioning which only recently seemed decades off, are now being realized and deployed at scale. This course will look at the advances in computer vision and machine learning that have made this possible. In particular we will look at Convolutional Neural Nets (CNNs), Recurrent Neural Nets (RNNs), Long Short Term Memories (LSTMs) and their application to computer vision. We will also look at the datasets needed to feed these data-hungry approaches–both how to create them and how to leverage them to address a wider range of applications. The course will have homework assignments and a final project; there will be no exams.


COMS 4995 Design using C++ | Bjarne Stroustrup.

Prerequisites: Graduate standing or senior undergraduate and an interest in the use of programming languages and tools.

Note: There will be no recording or remote access.

Abstract: This course will use the C++ language and some C++ libraries as examples of software design and design decisions. It is not a C++ programming course though it will use the principles and design decisions that underly the major C++ features as prime examples and see how they are expressed in actual features and libraries. The evolution of those ideals and the support offered is also examined. Some experience with C++ and other languages is assumed. The book “A Tour of C++ (3rd edition)” will be read over the first half of the course to give the students a common foundation in C++.

“Design” is not primarily a book skill; it involves a variety of principles and techniques that must be practiced for mastery. Consequently, the second half of the course is devoted to 3-person projects involving design, implementation, documentation, and presentation of projects of the students’ choice (after approval from the professor). Students are expected to attend other students’ presentations and see how they are expressed in actual features and libraries. The evolution of those ideals and the support offered is also examined. Some experience with C++ and other languages is assumed. The book “A Tour of C++ (3rd edition)” will be read over the first half of the course to give the students a common foundation in C++.

“Design” is not primarily a book skill; it involves a variety of principles and techniques that must be practiced for mastery. Consequently, the second half of the course is devoted to 3-person projects involving design, implementation, documentation, and presentation of projects of the students’ choice (after approval from the professor). Students are expected to attend other students’ presentations.



This is a DSI course therefore please refer to their website for the cross-registration instructions for NON-DS students

This course is designed as an introduction to elements that constitutes the skill set of a data scientist. The course will focus on the utility of these elements in common tasks of a data scientist, rather than their theoretical formulation and properties. The course provides a foundation of basic theory and methodology with applied examples to analyze large engineering, business, and social data for data science problems. Hands-on experiments with R or Python will be emphasized.



In 2011, Marc Andreessen penned a famous Wall Street Journal article “Why Software Is Eating The World.” Almost a decade later, software is still eating the world. Companies like Google, Amazon, Uber, and Airbnb are revolutionizing entire industry sectors, and even traditional enterprises have to embrace this transition to software and automate numerous tasks within their organizations in order to remain competitive. This software revolution is driven primarily by two technology trends. On the client side, billions of users now own computers and smartphones with broadband Internet access, providing each of them “instant access to the full power of the Internet, every moment of every day.” On the backend, cloud services and readily available software tools vastly simplify creating software startups in many industries, without the need to invest in infrastructure or employee training. For instance, when WhatsApp was acquired by Facebook for approximately $19.3 billion dollars, it powered hundreds of millions of users worldwide but had merely 50 employees. These technology trends not only enable software to flourish, but also fundamentally change our software engineering process. Broadband access enables developers to run their software in the cloud for users to access via browsers or mobile apps — so-called Software-as-a-Services (SaaS). The developers of SaaS products continuously gather user feedback and behavior analytics, quickly refine existing or build new product features, and deploy to production in matter of minutes to test out their ideas — so called Agile Development. This style of close collaboration with customers and fast iteration of product ideas is in stark contrast with how software was engineered two decades ago. In this course, we will study modern software engineering practices including including topics such as SaaS architecture, behavior-driven development, Ruby on Rails, and Dev/ops. For details on the topics we will cover, please check out the Course Syllabus page.


COMS W4995 Geometric Data Analysis | Andrew Blumberg

The goal of this class is to introduce approaches to analyzing data
presented as finite metric spaces using ideas from algebraic topology and differential geometry. Prerequisites are a grounding in basic probability, statistics, and linear algebra. The class will focus on rigorous mathematical foundations and applications drawn from computational genomics.

COMS 4995 HACKING 4 DEFENSE | Blaer, Paul  

Couse Website

Solve complex technology problems critical to our National Security with a team of engineers, scientists, MBAs, and policy experts. In a crisis, national security initiatives move at the speed of a startup yet in peacetime they default to decades-long acquisition and procurement cycles. Startups operate with continual speed and urgency 24/7. Over the last few years they’ve learned how to be not only fast, but extremely efficient with resources and time using lean startup methodologies. In this class student teams develop technology solutions to help solve important national security problems. Student teams take actual national security problems and learn how to apply the Lean launchpad and Lean Startup principles, (“business model canvas,” “customer development,” and “agile engineering”) to discover and validate customer needs and to continually build iterative prototypes to test whether they understood the problem and solution. Teams take a hands-on approach requiring close engagement with actual military, Department of Defense and other government agency end-users. Team applications required. Limited enrollment. Taught by Professor Paul Blaer and Jason Cahill, Hacking for Defense™ is a university-sponsored class that allows students to develop a deep understanding of the problems and needs of government sponsors in the Department of Defense and the Intelligence Community. In a short time, students rapidly iterate prototypes and produce solutions to sponsors’ needs. This course provides students with an experiential opportunity to become more effective in their chosen field, with a body of work to back it up. For government agencies, it allows problem sponsors to increase the speed at which their organization solves specific, mission-critical problems. For more information check out

COMS 4995  Innovation & Design Lab | Gary Zamchick

Abstract: Innovation & Design Lab inculcates the innovative mindset needed to envision inventive applications, deliver meaningful experiences to end users, and generate valuable products and services. The class is open to both undergraduate and graduate students.

Gary Zamchick has worked in and out of places like AT&T Labs Research, Sarnoff, and IBM as well as boutique design firms like Rockwell Group and Parson’s Institute of Information Mapping. His projects are diverse as designing the Disney World parade, innovation labs for Coca-Cola, banking experiences for Citibank and Tata, kiosks for intelligence agencies, on-demand learning models for IBM, and illustrating the best-selling “French for Cats” humor books. He co-founded WordsEye, one of the first startups to come out of Columbia’s Innovation Entrepreneurship program. Recently, he was responsible for Innovation Strategy at Delos, a wellness company.

COMS W4995 Intro to Agile Project Management | Tristian Boutros

Project management skills are essential for professionals to meet the ever-growing demands of today’s businesses and to succeed in the global economy. From technology to finance, and construction to healthcare, project management skills are applicable across every industry. The Introduction to Agile Project Management course is tailored to both individuals who have some project management-related experience, but aspire to enhance these skills, and individuals who are just starting out in their careers and wish to gain new skills that will serve them a lifetime. As a student enrolled in this course, you will gain the critical knowledge, and foundation needed to initiate, plan, execute, and manage a successful engineering project using both traditional and agile project management approaches. Upon the completion of this course, you will be able to describe the basic values, principles and practices of Agile project management and Scrum, learn to develop a project or product roadmap, and the skills and tools needed to successfully execute projects to completion. This course will also outline the importance of organizational culture in project activities and how to develop and implement a project management framework that works for your company. Course work will explore essential concepts and techniques in project management and how to apply them, including terminology, methodologies, people management, process management, leadership, and enterprise strategy integration. Upon completion of the course, you will possess the knowledge to begin to study for multiple industry certifications as delivered by the Project Management Institute (PMI), Scrum Alliance and


COMS 4995 INTRO TO DATA VISUALIZATION | Swinehart, Christian

Course Description:

This course is a hands-on introduction to design principles, theory, and software techniques for visualizing data. Classes will be a combination of lecture, design studio, and lab. Through readings, design critique, and code assignments, students will learn how visual representations can help in the understanding of complex data, and how to design and evaluate visualizations for the purpose of analysis or communication. Students will develop skills in processing data and building interactive visualizations using D3. Topics include visual perception, exploratory data analysis, task analysis, graphic design, visual hierarchy, narrative, etc.

Special Enrollment Procedures: 

Prospective students should add themselves to the waitlist and will be enrolled in the course based upon their completion of a warm-up assignment during the first week of the semester.

COMS 4995 INTRO TO NETWORKS AND CROWDS | Chaintreau, Augustin

This course covers the fundamentals underlying information diffusion and incentives on networked applications. Applications include but are not limited to social networks, crowdsourcing, online advertising, rankings, information networks like the world wide web, as well as areas where opinion formation and the aggregate behavior of groups of people play a critical role. Among structural concepts introduced and covered in class feature random graphs, small world, weak ties, structural balance, cluster modularity, preferential attachments, Nash equilibrium, Potential Game and Bipartite Graph Matching. The class examines the following dynamics: link prediction, network formation, adoption with network effect, spectral clustering and ranking, spread of epidemic, seeding, social learning, routing game, all-pay contest and truthful bidding.


Propositional logic: syntax and semantics, Resolution and Propositional Sequent Calculus soundness and completeness. First order logic: syntax and semantics, First Order Sequent Calculus soundness and completeness. Godel’s Incompleteness theorems. Computability: Recursive and recursively enumerable functions, Church’s thesis, unsolvable problems.

COMS 4995 Mathematics of Machine Learning and Signal Recognition | Homayoon S Beigi

Mathematics of Machine Learning and Signal Recognition provides the background mathematical background for addressing in-depth problems in machine learning, as well as the treatment of signals, especially time-dependent signals, specifically non-stationary time-dependent signals – although spatial signals such as images are also considered. The course will provides the essentials of several mathematical disciplines which are used in the formulation and solution of the problems in the above fields. These disciplines include Linear Algebra and Numerical Methods, Complex Variable Theory, Measure and Probability Theory (as well as statistics), Information Theory, Metrics and Divergences, Linear Ordinary and Separable Partial Differential Equations of Interest, Integral Transforms, Decision Theory, Transformations, Nonlinear Optimization Theory, and Neural Network Learning Theory. The requirements are Advanced Calculus and Linear Algebra. Knowledge of Differential Equations would be helpful.



Course outline and overall goals

  • Mobile application development based on Apple’s iOS operating system and Swift programming language.
  • The class will need to work on Apple Mac computers running Xcode, students will learn how to develop applications for iPhones/iPads/iWatch devices.
  • Students will create mobile applications based on the latest programming patterns and frameworks in use in the mobile industry today.
  • The class covers fundamentals essential to understanding all aspects of app development from concept to deployment on the App Store.
  • Taught in a team environment. Students required to present their project proposals and deliver a fully functional mobile application as a final project.

Class Topics

  • Lecture 1 – Intro to Mobile Application Development & XCode IDE 
  • Lecture 2 – Intro to Swift & Storyboard (Constraints) – my first app – [Homework 1 (10%) • Lecture
  • The App Lifecycle (App Delegate) & more Swift syntax; Intro to Source Code Control (Git) • Lecture
  • Storyboard Navigation (Back, Tabbar, Page) [Red/green/blue] • Lecture
  • More Swift (Delegation & TableViewControllers) [Homework 2 (10%)] • Lecture
  • More Swift (JSON parsing from file) [Shopping list] • Lecture
  • More Swift {Completion blocks} & Networking APIs (REST) [Homework 3 (20%)] • Lecture
  • iOS Frameworks (Core Location/MapKit/ARKit/HealthKit) • Lecture
  • Data Persistence (User defaults/CoreData) HW3 MIDTERM DUE (20%) • Lecture
  • UI/UX (Accessibility/i18n/l10n) [Final Project Description] • Lecture
  • Networking and Authentication for Enterprise Apps (REST/OAUTH/GRAPHQL) • Lecture
  • Application Analytics Tools (app statistics, crash reports) / [Final Project storyboard] • Lecture
  • Deploying your app (AppStore, Testflight, Enterprise distribution) • Lecture
  • Final Project presentations to class FINAL PROJECT DUE (50% of Class Grade)




It is very hard to hand design programs to solve many real world problems, e.g. distinguishing images of cats versus dogs. Machine learning algorithms allow computers to learn from example data, and produce a program that does the job. Neural networks are a class of machine learning algorithm originally inspired by the brain, but which have recently have seen a lot of success at practical applications. They are at the heart of production systems at companies like Google and Facebook for image processing, speech-to-text, and language understanding. This course gives an overview of both the foundational ideas and the recent advances in neural net algorithms.

Roughly the first 2/3 of the course focuses on supervised learning — training the network to produce a specified behavior when one has lots of labeled examples of that behavior. The last 1/3 focuses on unsupervised learning and reinforcement learning.



Course Description:

In this course, students will learn how to build and maintain open source projects.

Students will be responsible for both supporting an existing open source project of their choosing, as well as creating their own open source project (both in either Python or Javascript, under the direction of the instructor).

In lieu of homeworks or exams, ongoing progress reports and a final presentation on these two will serve as the grading basis for the course.

The course will include an overview of general software engineering concepts, including version control, testing, modern development workflows, and software licensing, as well as a brief history of open source software including major court cases in copyright, patent, and trademark law.

We will also learn about popular tools for

– project management (Github, Kanban/Agile boards)

– testing (GitHub Actions, Jenkins)

– static analysis and coverage (Codecov, Coveralls)

– documentation (GitHub Pages, Sphinx, MkDocs, Docusaurus, ReadTheDocs)

– publishing (npm, PyPI, anaconda)

Beyond the basics, students will learn how these tools can be integrated together in their open source projects to better manage their development and community.




Prerequisites: COMS 3157 Advanced Programming or the equivalent. Knowledge of at least one programming language and related development tools/environments required. Functional programming experience not required. Functional programming in Haskell, with an emphasis on parallel programs. The goal of this class is to introduce you to the functional programming paradigm. You will learn to code in Haskell; this experience will also prepare you to code in other functional languages. The first half of the class will cover basic (single-threaded) functional programming; the second half will cover how to code parallel programs in a functional setting.

COMS W4995 Semantic Representations for NLP | Daniel Bauer

Most NLP tasks and applications require some level of understanding of the semantics (meaning) of linguistic expressions. The question of how to represent semantics and how to map between surface linguistic expressions to semantic representations is therefore an important part of NLP research. This course will explore some of the challenges surrounding semantic representations in various applications. We will compare a variety of approaches to representing the meaning of words and sentences from an NLP perspective, including symbolic/logic-based representations and resources, as well as modern distributional and multi-modal approaches. Requirements include homework assignments, a paper presentation, and a final project.


COMS 4995 TECH INTERVIEW PREP C++ | Yongwhan Lim

Upon successful completion of the course, you will have mastered the fundamental knowledge required to succeed in any entry-level technical interview at the top-tier IT companies (FAANG or equivalent). In case you are looking for a job in the IT industry upon graduation, you will get exposed to what software engineers do on a day-to-day basis. We will also touch on some system design and behavioral interview questions but, as those are typically not part of the entry-level technical interview loop, those would not be the main focus of the course



DNA sequences are life’s information system. The recent technological revolution allowing ubiquitous collection of such sequence data opened up applications from viral identification to molecular barcoding to forensics to efficient long-term data storage and more. The class will present background to the available technologies along with an introduction to what can be gleaned from the data they produce. The class will include a significant hands-on component in hackathon format. The class is open to an interdisciplinary community of undergraduates and graduate students who will mutually benefit one another with their disciplinary skills. Specifically, students with zero background in biology are welcome. Computing background is needed at a very basic level (Intro CS, or Computing in Context, or AP CS or instructor’s approval). A detailed description of a previous run of the class is available in




Course Description:  This class will introduce students to spoken language processing:  basic concepts, analysis approaches, and applications.  Applications include Text-to-Speech Synthesis, dialogue systems, and analysis of entrainment, personality, emotion, humor and sarcasm, deception and charisma.

6998 ADV TOPICS PROJ DEEP LEAR | Belhumeur, Peter

This is a seminar course in which the students read, present, and discuss research papers on deep learning. The focus will be mostly on applications in computer vision, but topics in natural language processing, language translation, and speech recognition will also be read and discussed. It is expected that students taking the course will have prior experience with deep learning and neural network architectures. There will be no assignments other than reading and presentations. However, there will be a final project that students will work on throughout the duration of the semester. Enrollment is capped at 25 students. Instructor permission is required to register.

COMS 6998 ADV WEB DESIGN STUDIO | Chilton, Lydia

This semester, Advanced Web Design Studio is partnering with faculty and students in Journalism and Architecture to design, build and deploy “public interest technology.” We will introduce interdisciplinary design methods and principles for Human Computer Interaction — mixing Architecture, Urban Planning, Computer Science, and Journalism — that respond to the potentials as well as the adverse effects of computation on society today. Our work will be dedicated to leveraging technology to support public interest organizations. We will move beyond short-term metrics—clicks, daily active users, and private profit—to focus on fostering long-term value and local networks. Students will work together to create and deploy interdisciplinary projects in collaborative teams, with the aim of serving the public and using technology to advance justice, equality, and inclusion in society. We will also be alert to the ways in which the language of “public interest” can sometimes hide or offer alibis for other political or private interests. For the first half of the semester, coding and design exercises, and complete short readings on journalism, urban planning, and public interest technology. During the second half of the semester students will iterate their work and put their projects into action. A strict pre-requisite is to have taken COMS 4170 UI Design, or have taken an equivalent class with both web-based implementation and design components. You must also fill out the prerequisite form that will be available thru SSOL. It is also here: The faculty instructors in each area are: Lydia Chilton: School of Engineering, Computer Science; Mark Hansen: School of Journalism; Laura Kurgan: Graduate School of Architecture, Planning and Preservation

Here are some examples of Public Interest Technology:

— Ushahidi: Since 2008, thousands have used this crowdsourcing platform in disaster settings, from violence post-election to earthquakes and floods worldwide.

— OneBusAway: Results from Providing Real-Time Arrival Information for Public Transit. (2010)

— Discrimination in Online Ad Delivery: Google ads, black names and white names, racial discrimination, and click advertising (2013)

— Anti Eviction Mapping is a volunteer data-visualization, data analysis, and storytelling collective documenting the dispossession and resistance upon gentrifying landscapes (since 2013)


Algebraic techniques have been used in nearly every area of computer science. This course will develop some of these techniques and explore a broad range of applications. We will mostly see applications to algorithm design and complexity theory, but we will touch on applications to other areas as well. The course will focus on four main topics:

Algebraic graph algorithms. The fastest known algorithms for several important graph problems use algebraic tools. Often, the graph problem appears (at first glance) to have nothing to do with algebra, and algorithm designers have found surprising algebraic connections.

The polynomial method. If one can capture a computational task using a simple polynomial, then fast algorithms for manipulating polynomials can often be used to solve the problem more efficiently. In contrast, if one can show that a task cannot be captured by simple polynomials, this can often be used to prove complexity-theoretic lower bounds.

Matrix rigidity. Some of the most important and applicable algorithms are for computing linear transforms such as Fourier transforms. Matrix rigidity is a powerful but poorly understood tool for understanding their complexity. A matrix is called rigid if it cannot be decomposed as the sum of a low-rank matrix and a sparse matrix. It remains an open problem to prove that any explicit family of matrices is rigid.

Matrix multiplication algorithms. Matrix multiplication is prevalent in computational science and is known to be equivalent to many other computational problems in linear algebra. We will see how fast matrix multiplication algorithms are designed, including some recent developments.
Evaluation will be primarily based on a final project. I will provide a list of suggestions for the project, but students are encouraged to relate their project to their interests. There will also be occasional (at most 4) homework assignments, and students will be asked to scribe a lecture.

This is a graduate-level course on Cloud Computing and Big Data with an emphasis on hands-on design and implementation. You will learn to design and build extremely large-scale systems and learn the underlying principles and building blocks in the design of such large-scale applications. You will be using real Cloud platforms and services to learn the concepts to build such applications.

The first part of the course covers basic building blocks such as essential cloud services for web applications, cloud programming, virtualization, containers, kubernetes and micro-services. We shall learn these concepts by using and extending capabilities available in real clouds such as Amazon AWS, Google Cloud.

The second part of the course will focus on the various stacks used in building an extremely large-scale system, such as (i) Kafka for event logging and handling, (ii) Spark and Spark streaming for large-scale computations, (iii) Elastic Search for extremely fast indexing and search, (iv) various noSQL database services such as DynamoDB, Cassandra, (v) cloud-native with kubernets, (vi) cloud platforms for Machine Learning and Deep Learning based applications. Several real-world applications will be covered to illustrate these concepts and research innovations.

Students are expected to participate in class discussions, read research papers, work on three programming assignments, and conduct a significant course project. Given that this is a very hands-on course, it is expected that students have a decent programming background.

Prerequisite: Good programming experience in any language, Concepts of Web Applications and Systems
Reading Material: Lecture Notes, Reading Papers, Reference Text Books, Lot of Engineering Docs
Grading: 3 Programming Assignments (35%), 2 Quizzes (25%), Project (40%)

COMS 6998 COMPUTATION AND THE BRAIN | Papadimitriou, Christos

Prerequisites: Familiarity with computation and algorithms. Some familiarity with Neuroscience; or willingness to venture into a vast, complex, and fascinating field. Familiarity with machine learning useful.

Despite brilliant and accelerating progress in experimental Neuroscience, there is little progress towards an overarching understanding of how the Brain works. This course will cover some basic material and results in Neuroscience, both classical and recent, from a computational perspective, that is, under the hypothesis that the way the Brain works is largely computational.”



Empirical Methods of Data Science is a seminar for students seeking an in depth understanding of how to conduct empirical research in computer science. In the first part of the seminar, we will discuss how to critically examine previous research, build and test hypotheses, and collect data in the most ethical and robust manner. As we explore different means of data collection, we will dive into ethical concerns in research. Next, we will explore how to most effectively analyze different data sets and how to present the data in engaging and exciting ways. In the last part of the seminar, we will hear from different researchers on the methods they use to conduct research, lending to further conversations and in class debates about when and how to use particular research methods. The focus will be primarily on relatively small data sets but we will also address big data.


COMS 6998 FOUNDATIONS OF BLOCKCHAIN | Roughgarden, TimothyThis course will provide a rigorous introduction to the mathematical analysis of blockchain protocols, including the relevant tools and definitions from game theory, economics, and distributed computing. The target audience is beginning PhD students with interest and experience in theoretical computer science.


This course is a graduate seminar on research in the verification of system software. The goal of the class is to get the students to build provably correct software. The structure of the class will consist of students presenting research papers during lecture, and students working on a significant research project. We expect students to start working on the project in the first week or two, and continue for the entire semester, culminating in a draft research paper. Examples of project include building a certified file system, simple OS kernel, hypervisor, database system, etc. We expect projects will build on tools such as Coq, and Z3. To get students up to speed with these tools, we will offer several tutorial lectures on Coq, using parts of the Software Foundations textbook.


Fundamentals of Speech Recognition is a comprehensive course, covering all aspects of automatic speech recognition from theory to practice. In this course such topics as Anatomy of Speech, Signal Representation, Phonetics and Phonology, Signal Processing and Feature Extraction, Probability Theory and Statistics, Information Theory, Metrics and Divergences, Decision Theory, Parameter Estimation, Clustering and Learning, Transformation, Hidden Markov Modeling, Language Modeling and Natural Language Processing, Search Techniques, Neural Networks, Support Vector Machines and other recent machine learning techniques used in speech recognition are covered in some detail. Also, several open-source speech recognition software packages are introduced, with detailed hands-on projects using Kaldi to produce a fully functional speech recognition engine. The lectures cover the theoretical aspects as well as practical coding techniques. The course is graded based on a project. The Midterm (40% of the grade is in the form of a two-page proposal for the project, and the final (60% of the grade) is an oral presentation of the project plus a 6-page conference style paper describing the results of the research project. The instructor uses his own Textbook for the course, Homayoon Beigi, “Fundamentals of Speaker Recognition,” Springer-Verlag, New York, 2011. Every week, the slides of the lecture are made available to the students.


COMS 6998 HIGH PERF MACH LEARNING | Parijat Dube and Kaoutar El-Maghraoui


During the past decades, the field of High-Performance Computing (HPC) has been about building supercomputers to solve some of the biggest challenges in science. HPC is where cutting edge technology (GPUs, low latency interconnects, etc.) is applied to the solution of scientific and data-driven problems.

One of the key ingredients to the current success of ML is the ability to perform computations on very large amounts of training data. Today, the application of HPC techniques to ML algorithms is a fundamental driver for the progress of Artificial Intelligence.
In this course, you will learn HPC techniques that are typically applied to supercomputing software, and how they are applied to obtain the maximum performance out of ML algorithms. You will also learn about techniques for building efficient ML systems. The course is based on PyTorch, CUDA programming, MPI.


Knowledge of computer architecture and operating system through course work; C/C++: intermediate programming skills; Python: intermediate programming skills; Knowledge of machine learning concepts and deep learning algorithms through course work

Topics covered

ML/DL and PyTorch basics, PyTorch performance. Performance optimization in Pytorch, Parallel performance modeling, Intro to CUDA and CUDA programming, Math libraries for ML (cuDNN), DNNs architectures (CNN, RNN, LSTM, Attention, Transformers) in Pytorch, Intro to MPI, Distributed ML, Distributed PyTorch algorithms, parallel data loading, and ring reduction, Quantization and model compression, hw/sw co-design and co-optimization of DNNs



COMS E6998 Human-Computer Interaction | Brian Smith

This course is a graduate-level seminar in which we meet once a week to discuss several human–computer interaction (HCI) research papers. We will meet once a week and cover a different research area within HCI each week. The class is open to graduate students and, with instructor permission, undergraduate students.

Students will be expected to read all of the papers, write weekly reflections about the papers, and present one or two of them to the class in a presentation that is roughly 20 mins long. In our discussions, we will be talking about the papers themselves as well as the research strategy and methodology behind them.

Stanford’s CS 376 is an HCI seminar of similar vein. You can see the following page for their most recent syllabus of papers:
(Note that research seminar courses, including both ours and Stanford’s, tend to change the list of papers covered every year.)

Our syllabus will be similar in format, and I will announce the full list of papers on the first day of class.

COMS 6998 Internet Measurement (syllabus, COMS, ELEN) | Ethan Katz-Bassett

In this course, we will investigate important problems, techniques, results, and challenges that arise in measuring the Internet. We will explore both what measurements tell us about the Internet and how we can leverage what they tell us to improve the Internet and services that run on it. We will learn to measure various aspects of the Internet, including topology, routing and routing policies, performance, failures, traffic, and applications. Researchers often talk about Internet measurement as being analogous to astronomy, in that we take observations from afar in order to understand how a system works. The class will be discussion-based. Students will conduct small research projects in groups — in past offerings of this course, most students published their projects in peer-reviewed venues.


In this course, we will study two aspects of how ml interacts with Earth’s climate.
First, we will investigate how ml can be used to tackle climate change. We will focus on use cases from transportation, manufacturing, food and agriculture, waste management, and atmospheric studies. We will ask questions like: what are the requirements for applying ml to such problems? How can we evaluate the effectiveness of our analyses?

Second, we will consider ml’s own impact on the climate. We will focus on the energy and computation that goes into designing, training, and deploying modern ml systems. We will ask questions like: how can we accurately track and account for ml’s own energy footprint? What strategies can we employ to minimize it?

By the end of this course, you will learn about modern statistical and causal ml methods and their applications to the climate. Our focus will be the modeling of real-world phenomena using probability models, with a focus on vision, time series forecasting, uncertainty quantification, and causality. In addition, you will gain a deeper understanding about the carbon footprint of ml itself and explore how to mitigate it.



There has been tremendous progress recently in the development of models to generate language for different purposes and to produce summaries of input documents. This success has largely come about due to rapid advances in large language models. In this class, we will explore four main topics: language generation, multimodal generation, summarization and long-format question answering. We will study large language models that have been used for these different tasks and issues that arise. For example, how possible is to control the output of these large language models along different dimensions? How do we evaluate the textual output of such systems? How can we develop models to produce or summarize creative texts? What are the ethical issues surrounding these kinds of models?

This class will follow a seminar style. It will include a mixture of lectures from the instructor, guest lectures and student presentations. Students who take the class will have three main assignments. 1. For each class there will be a reading assignment consisting of several research papers. Students are responsible for reading all papers. 2. Students will be part of a presentation group which will be responsible for presenting a paper and raising critiques about one or more papers in class. 3. Each student will carry out a semester long project. Each student will be required to submit a proposal for the project near the beginning of class, turn in a midterm progress report and submit a final report and code for their project. They will do a short video presentation which will be made available to the class, the Tas and the instructor for viewing. There will be no midterm or final exam.

Pre-requisite: COMS 4705, Natural Language Processing or equivalent.



This course will cover several topics in performance evaluation of machine learning and deep learning systems. Major topics covered in the course: Algorithmic and system level introduction to Deep Learning (DL), DL training algorithms, network architectures, and best practices for performance optimization, ML/DL system stack on cloud, Tools and benchmarks (e.g., DAWNBench) for performance evaluation of ML/DL systems, Practical performance analysis using standard DL frameworks (tensorflow, pytorch) and resource monitoring tools (e.g., nvidia-smi), Performance modeling to characterize scalability with respect to workload and hardware, Performance consideration with special techniques like transfer learning, semi-supervised learning, neural architecture search. Emphasis will be on getting working knowledge of tools and techniques to evaluate performance of ML/DL systems on cloud platforms. The assignments will involve running experiments using standard DL frameworks (tensorflow, pytorch) and working with open source DL technologies. The students will gain practical experience working on different stages of DL life cycle (development and deployment) and understanding/addressing related system performance issues.


Course Website

This is an advanced, PhD-level topics class in the theory of quantum computing, focusing in particular on cutting-edge topics in quantum complexity theory and quantum cryptography. The theme of the topics this year will be

Complexity of quantum states and state transformations. An n-qubit quantum state appears to be exponentially more complex than an n-bit string; a classical description of such a state requires writing down 2n complex amplitudes in general. Here are some questions that will motivate our explorations:

  • How does this complexity manifest itself in different information-processing tasks?
  • How can we quantify this complexity?
  • Can this complexity be used for cryptographic applications?
  • Does the complexity of quantum states relate to traditional concepts in complexity theory such as circuit complexity or space complexity or time complexity?

A list of possible topics:

  • Shadow tomography of quantum states
  • Quantum state synthesis/unitary synthesis
  • Hamiltonian learning
  • Quantum key distribution/quantum money
  • QMA(2) and the power of unentanglement
  • State complexity, AdS/CFT, and quantum gravity
  • Quantum state complexity and statistical zero knowledge
  • Pseudorandom states and unitaries
  • Classical vs. quantum proofs

The following notes from Scott Aaronson will be frequently consulted:

The goal is to explore the results and (more importantly) the questions in this field. Course work may include: scribe notes, a few assignments, and a course project.


Must have taken: Basic linear algebra and basic probability theory.

You must have taken at least one of:

  • An upper-level CS theory course such as complexity theory/cryptography/algorithms, or
  • Quantum computing, or
  • Quantum physics.

It is not required that you have taken a course in quantum computing/quantum information before, but it would definitely be helpful. The lectures will introduce the concepts as needed, but will go rather quickly.

The course will expect high levels of mathematical maturity and comfort with proofs. To get a sense of what this course is like, take a look at the lecture notes from this Fall 2020 topics course.



This course involves reading, presenting, and discussing papers on programming language design. In the first lecture, I outline some dimensions of language design and give examples of aims and ideals that have driven the development of languages. I also present a (necessarily incomplete) list of classical design papers. In the second lecture, I present a paper for discussion to give an idea of how that may be done. This is not a comparative languages class nor a type-theory class. The focus on presentations should be on ideals, aims, and principles so that language features are presented simply as examples. For older languages it is possible and desirable to discuss how the ideas worked out in actual use and evolved. Uncritical “sales jobs” of languages – as are not uncommon in early descriptions of new languages – are not acceptable as presentations. You might, of course, choose to present your favorite language, but maintain a critical and objective perspective.


The course will discuss the latest research in representation learning. Each week, we will discuss a topic by reading papers in computer vision, NLP, machine learning, and robotics. Attendance is required and every student will be expected to speak during each class.


Over the past few years, Machine Learning (DL) has made tremendous progress, achieving or surpassing human-level performance for a diverse set of tasks including image classification, speech recognition, and playing games such as Go. These advances have led to widespread adoption and deployment of ML in security- and safety-critical systems such as self-driving cars, malware detection, and aircraft collision avoidance systems. This wide adoption of DL techniques presents new challenges as the predictability and correctness of such systems are of crucial importance. Unfortunately, ML systems, despite their impressive capabilities, often demonstrate unexpected or incorrect behaviors in corner cases for several reasons such as biased training data, overfitting, and underfitting of the models. In safety- and security-critical settings, such incorrect behaviors can lead to disastrous consequences such as a fatal collision of a self-driving car. For example, a Google self-driving car recently crashed into a bus because it expected the bus to yield under a set of rare conditions but the bus did not. A Tesla car in autopilot crashed into a trailer because the autopilot system failed to recognize the trailer as an obstacle due to its “white color against a brightly lit sky” and the “high ride height.” Such corner cases were not part of Google’s or Tesla’s test set and thus never showed up during testing. Other examples include Microsoft’s Tay chatbot tweeting racist words because it was misled by malicious twitter users, and Google removing “gorilla” as an image class after its image classification algorithm incorrectly classified dark skined people as gorillas.

These challenges have drawn huge attention from researchers in machine learning, security, systems, and programming language communities. A number of techniques and theories have been proposed to increase the robustness and security of machine learning. In this course, we will study the most practical and most important of these techniques and theories with a focus on deep learning. For details on the topics we’ll cover, please go to the Course Syllabus page.


Human Data Interaction studies the interface between humans and data. What types of interfaces are suitable for different data tasks? Further, creating human data interfaces is extremely challenging because the responsiveness of the interface directly depends on the system architecture as well as the interface design. What system innovations are needed to simplify how effective human data interfaces can be created and used?

Human Data Interaction is a nascent field, and we will study modern research in data visualization, HCI, data analysis, and data management systems. This seminar course will center around reading, reviewing, and discussing research papers. Each session will consist of a round table discussion of the week’s readings to understand the context, the technical details, and to brain storm follow up research questions. Students will work in small teams on a semester-long research project that is within the scope of the course topic.

COMS 6998: The Economics of Cyber Security| Adam Hastings & Simha Sethumadhavan

Prerequisites: COMS 4181 or equivalent; and at least two additional COMS or CSEE 41xx or 48xx classes or equivalent with instructor approval.

Description: Designing modern computer systems requires computer systems engineers to balance factors of human behavior, ethics, incentives, economics, policy, laws, and regulations, in addition to traditional engineering requirements like performance, cost, scalability, and security. This class will prepare computer systems engineers to design systems to meet these modern, multi-faceted needs.


One of the most significant transformations in the computer sector over the last decade has been the rise of cloud computing. Companies are progressively transferring their tasks to external cloud platforms and utilizing advanced global services that were previously unattainable within individual data centers. Nonetheless, the construction and utilization of cloud systems entails tackling numerous intricate research challenges. This research seminar will explore both industrial and academic contributions to cloud computing. Participants will engage in guest presentations delivered by prominent authorities in the field, analyze and guide discussions on research papers, and collaborate in small groups to undertake a research project throughout the semester.

COMS 6998: Topics in Mobile X | Xia Zhou

Prerequisites: Because this is a high-level course, we assume that students already have a solid understanding of the basics of some areas, including networking, communication, and machine learning. There are multiple ways of demonstrating a networking and machine learning background, including taking undergraduate networking classes, machine learning classes, or equivalent classes, and experiences of working on related projects. The course project is an important part of this class. Students must have good programming skills and project experience.

Abstract: Topics in Mobile X is an upper-level course on mobile computing and ubiquitous systems, covering a broad range of advanced and interdisciplinary topics in mobile computing, networking, and applications. All these topics center on unique challenges faced on bringing computation, networking, and applications to the mobile computing platform — a platform that is constrained in form factor, energy, and computation power, with examples such as smartphones, smart watches or wrist bands, smart glasses, and more. Example topics include mobile communication and networking, mobile sensing, mobile human-to-computer interaction (HCI), mobile learning/AI, and mobile security.

For each topic, we will study both conventional perspectives and recent research trends. Students will learn key principles in mobile computing research, understand the state of the art in this research area, and gain experience of carrying out original research through class projects. The end goal is to generate publishable results by the end of the term. In addition, students will practice their skills in paper reading/writing and public presentation in the form of weekly research essays, class discussions, paper critiques, project reports and presentations.

As a research-oriented course, this course is based on research papers in top-tier conferences. No particular textbooks are required, but following textbooks are good references for students to refresh networking background and better understand papers.

Course Website:



This is an advanced seminar course that will focus on the latest research in machine learning for robotics. More specifically, we study how machine learning and data-driven method can influence the robot’s perception, planning, and control. For example, we will explore the problem of how a robot can learn to perceive and understand its 3D environment, how they can learn from experience to make reasonable plans, and how they can reliably act upon with the complex environment base on their understanding of the world. Students will read, present, and discuss the latest research papers on robot learning as well as obtain experience in developing a learning-based robotic system in the course projects.

COMS 6998 Types, Languages, and Compilers | Stephen Edwards

An advanced course on modern programming language and type theory with
a focus on functional languages and concrete implementations. The
goal is to become fluent in the concepts and formalisms typical of
papers in conferences such as POPL.

Judgments, Logic, and Inductive Definitions;
Context-free Grammars;
the Lambda Calculus;
the Simply Typed Lambda Calculus;
Type Inference, Hindley-Milner-Damas;
Operational Semantics;
Proof Assistants, Dependent Type Systems,
and the Curry-Howard Correspondence.

COMS 4115 Programming Languages and Translators;
COMS 3203 Discrete Mathematics;
COMS 3261 Computer Science Theory. Equivalents acceptable

COMS E6998 Virtual Technologies for Cloud Computing | Jason Nieh

Prerequisites: COMS 4118 Operating Systems or the equivalent.

This course will cover the underlying technologies that enable major cloud computing providers to deliver computing resources to consumers on-demand over the Internet, focusing primarily on virtualization and Infrastructure as a Service (IaaS) cloud models. Topics to be covered will include many aspects of the design and implementation of hypervisors and containers, ranging from architectural support for virtualization to live migration to cloud computing security. The course will have homework assignments and a final project, both of which will involve systems programming.

COMS 6998: Unconditional Lower Bounds and Derandomization | Rocco Servedio

Prerequisites: The most important prerequisite is “mathematical maturity”; you should be comfortable with proofs, basic discrete math, combinatorics, probability, and linear algebra. If you have questions about your mathematical readiness you should contact the instructor before enrolling. COMS 4236 (Introduction to Computational Complexity) is good preparation, but this is not required.

The course will not include any programming.

Description: This is a course about unconditional lower bounds and complexity-theoretic pseudorandom generators for restricted models of computation. We will study many unconditional lower bounds and derandomization results for interesting and important models of computation, covering various “gens” of the field that have been discovered over the past few decades. There will be an emphasis on techniques and we will highlight many open questions along the way.


This course will cover selected topics in VR and AR. There are two main components, with everyone participating in both: papers and projects.

Papers: Throughout the semester, we will be reading, presenting, and discussing papers that address important research themes in VR and AR. Each of you will be individually responsible for reading and commenting on all the papers and for participating in a team presentation of a selected set of papers related to one theme.

Projects: You will be developing a project in a small (individual or 2-3 person) team using the Unity 3D development environment. Projects will use either headset-based VR or phone-based AR. Through the generosity of a Provost Emerging Technology Grant, I expect to lend an Oculus Quest to each of you doing a headset-based VR project. Thus, members of all team projects can work remotely from each other without needing to share equipment. While you can generate your own project idea, I will encourage you to choose from a collection of projects co-advised with faculty and students in other schools, ranging from Medicine and Dentistry, to Social Work.

Prerequisites: You will ideally have taken COMS W4172 (3D User Interfaces and Augmented Reality) or equivalent and be comfortable with developing in Unity. However, I am willing to admit students who don’t have this background, with the understanding that you will need to get up to speed with Unity, starting early in September.

Please fill out the survey linked to the course SSOL entry, which is a necessary precondition to being admitted to the course.

Computing in Context (COMS W1002) is a computer science course for non-majors, emphasizing computational methods for text analysis while teaching Python programming. The class combines lectures in basic computer science with lectures and projects applying those methods to multiple disciplines within the liberal arts, including digital humanities, social science, and econ financing. For more information, see It’s a Computing Revolution in the Liberal Arts.

Introduction to Computing for Engineers and Applied Scientists (ENGI E1006) introduces computational thinking, algorithmic problem solving, and Python programming with projects designed around applications in science and engineering. Intended for first-year SEAS students.

Emerging Scholars Program (COMS W1404) is a 1-point, pass/fail, semester-long program that concentrates on the collaborative and problem-solving aspects of computer science. Weekly workshops give students an extra opportunity to explore CS-related topics and fields.


Previous Years: 2021 - 2022 | 2022 - 2023 | 2023 - 2024 | 2024 - 2025

Updated 10/27/2023