The below are course descriptions for the Fall 2022 Topics Courses. This page will be updated as new information is made available. The official record of what will be offered is listed on the Directory of Classes. Please use this only as a resource in your course planning. Undergraduates should consult your CS Faculty advisor to see if a course counts for your track. MS students should consult the Topics page (if not listed then consult your CS Faculty advisor).
For questions regarding Data Science courses please email: DataScience-Registration@columbia.edu
COMS 4995.001 HACKING 4 DEFENSE | Blaer, Paul
COMS 4995.002 PARALLEL FUNCTIONAL PROGR | Edwards, Stephen
COMS 4995.003 EMPIRICAL METHODS DATA SC | Levine, Michelle
COMS 4995.004 NEURAL NETWORKS DEEP LEAR | Zemel, Richard
COMS 4995.005 LOGIC AND COMPUTABILITY | Pitassi, Toniann
COMS 4995.006 DEEP LRNG FOR COMP VISION | Peter Belhumeur
COMS 4995.007 TECH INTERVIEW PREP C++ | Yongwhan Lim
COMS 4995.008 COMPETITIVE PROGRAMMING | Yongwhan Lim
COMS 4995.009 DESIGN USING C++ | Bjarne Stroustrup
COMS 4995.010 APPLIED DEEP LEARNING | Gordon, Joshua
COMS 4995.011 APPLIED MACHINE LEARNING | Pappu, Vijay
COMS 4995.012 ELEMENTS FOR DATA SCIENCE | Gibson, Bryan
COMS 4995.020 Mathematics of Machine Learning and Signal Recognition | Homayoon S Beigi
COMS 4995.021 BLDNG SUCCESSFUL STARTUP | Apoorv Agarwal
COMS 6998.001 TOPICS IN ROBOTIC LEARNIN | Song, Shuran
COMS 6998.002 ADV WEB DESIGN STUDIO | Chilton, Lydia
COMS 6998.004 DIALOG SYSTEMS (CONVERSNL | Yu, Zhou
COMS 6998.005 REPRESENTATION LEARNING | Vondrick, Carl
COMS 6998.006 ADV SPOKEN LANGUAGE PROCE | Hirschberg, Julia
COMS 6998.007 TOPICS DATACENTER NETWORK | Misra, Vishal
COMS 6998.008 ENGR WEB3 BLOCKCHAIN APPS | Yang, Junfeng
COMS 6998.009 FUND SPEECH RECOGNITION | Beigi, Homayoon
COMS 6998.010 FINE GRAINED COMPLEXITY | Alman, Josh
COMS 6998.011 NATURAL LANG GEN SUMMARIZ | McKeown, Kathleen
COMS 6998.012 PRACT DEEP LEARNING SYS P | Dube, Parijat
COMS 6998.013 FAIR AND ROBUST ALGORITHM | Zemel, Richard
COMS 6998.014 ANALYSIS OF NETWORKS & CR | Chaintreau, Augustin
COMS 6998.015 CLOUD COMPUTING & BIG DAT | Sahu, Sambit
COMS 6998.016 MACHINE LEARNING &CLIMATE | Kucukelbir, Alp
COMS 6998.017 HIGH PERF MACH LEARNING | Parijat Dube and Kaoutar El-Maghraoui
COMS 6998.018 READINGS LANGUAGE DESIGN | Bjarne Stroustrup
COMS 4995.001 HACKING 4 DEFENSE | Blaer, Paul
Solve complex technology problems critical to our National Security with a team of engineers, scientists, MBAs, and policy experts. In a crisis, national security initiatives move at the speed of a startup yet in peacetime they default to decades-long acquisition and procurement cycles. Startups operate with continual speed and urgency 24/7. Over the last few years they’ve learned how to be not only fast, but extremely efficient with resources and time using lean startup methodologies. In this class student teams develop technology solutions to help solve important national security problems. Student teams take actual national security problems and learn how to apply the Lean launchpad and Lean Startup principles, (“business model canvas,” “customer development,” and “agile engineering”) to discover and validate customer needs and to continually build iterative prototypes to test whether they understood the problem and solution. Teams take a hands-on approach requiring close engagement with actual military, Department of Defense and other government agency end-users. Team applications required. Limited enrollment. Taught by Professor Paul Blaer and Jason Cahill, Hacking for Defense™ is a university-sponsored class that allows students to develop a deep understanding of the problems and needs of government sponsors in the Department of Defense and the Intelligence Community. In a short time, students rapidly iterate prototypes and produce solutions to sponsors’ needs. This course provides students with an experiential opportunity to become more effective in their chosen field, with a body of work to back it up. For government agencies, it allows problem sponsors to increase the speed at which their organization solves specific, mission-critical problems. For more information check out http://www.h4di.org/
COMS 4995.002 PARALLEL FUNCTIONAL PROGR | Edwards, Stephen
Prerequisites: COMS 3157 Advanced Programming or the equivalent. Knowledge of at least one programming language and related development tools/environments required. Functional programming experience not required. Functional programming in Haskell, with an emphasis on parallel programs. The goal of this class is to introduce you to the functional programming paradigm. You will learn to code in Haskell; this experience will also prepare you to code in other functional languages. The first half of the class will cover basic (single-threaded) functional programming; the second half will cover how to code parallel programs in a functional setting.
COMS 4995.003 EMPIRICAL METHODS DATA SC | Levine, Michelle
Empirical Methods of Data Science is a seminar for students seeking an in depth understanding of how to conduct empirical research in computer science. In the first part of the seminar, we will discuss how to critically examine previous research, build and test hypotheses, and collect data in the most ethical and robust manner. As we explore different means of data collection, we will dive into ethical concerns in research. Next, we will explore how to most effectively analyze different data sets and how to present the data in engaging and exciting ways. In the last part of the seminar, we will hear from different researchers on the methods they use to conduct research, lending to further conversations and in class debates about when and how to use particular research methods. The focus will be primarily on relatively small data sets but we will also address big data.
COMS 4995.004 NEURAL NETWORKS DEEP LEAR | Zemel, Richard
It is very hard to hand design programs to solve many real world problems, e.g. distinguishing images of cats versus dogs. Machine learning algorithms allow computers to learn from example data, and produce a program that does the job. Neural networks are a class of machine learning algorithm originally inspired by the brain, but which have recently have seen a lot of success at practical applications. They are at the heart of production systems at companies like Google and Facebook for image processing, speech-to-text, and language understanding. This course gives an overview of both the foundational ideas and the recent advances in neural net algorithms.
Roughly the first 2/3 of the course focuses on supervised learning — training the network to produce a specified behavior when one has lots of labeled examples of that behavior. The last 1/3 focuses on unsupervised learning and reinforcement learning.
COMS 4995.005 LOGIC AND COMPUTABILITY | Pitassi, Toniann
Topics:
Propositional logic: syntax and semantics, Resolution and Propositional Sequent Calculus soundness and completeness. First order logic: syntax and semantics, First Order Sequent Calculus soundness and completeness. Godel’s Incompleteness theorems. Computability: Recursive and recursively enumerable functions, Church’s thesis, unsolvable problems
COMS 4995.006 DEEP LRNG FOR COMP VISION | Peter Belhumeur
Recent advances in Deep Learning have propelled Computer Vision forward. Applications such as image recognition and search, unconstrained face recognition, and image and video captioning which only recently seemed decades off, are now being realized and deployed at scale. This course will look at the advances in computer vision and machine learning that have made this possible. In particular we will look at convolutional neural nets, recurrent neural nets, transformers, vision transformers and their application to computer vision. We will also look at the datasets needed to feed these data hungry approaches–both how to create them and how to leverage them to address a wider range of applications. The course will have homework assignments and a final project; there will be no exams.
COMS 4995.007 TECH INTERVIEW PREP C++ | Yongwhan Lim
Upon successful completion of the course, you will have mastered the fundamental knowledge required to succeed in any entry-level technical interview at the top-tier IT companies (FAANG or equivalent). In case you are looking for a job in the IT industry upon graduation, you will get exposed to what software engineers do on a day-to-day basis. We will also touch on some system design and behavioral interview questions but, as those are typically not part of the entry-level technical interview loop, those would not be the main focus of the course.
COMS 4995.008 COMPETITIVE PROGRAMMING | Yongwhan Lim
Upon successful completion of the course, you will have mastered the fundamental knowledge required to succeed in:
- Competing with confidence in ICPC contest series: Columbia Locals, Greater New York Regionals, North America Championships, and World Finals. Please check the Columbia team site here.
- Passing with confidence in any entry-level technical interview at the top-tier companies.
- Attaining high ratings in any competitive programming websites (e.g., CodeForces, AtCoder, LeetCode, and etc.)
We will go over topics and strategies required to succeed in the competitive programming contests!
COMS 4995.009 DESIGN USING C++ | Bjarne Stroustrup
Course description coming soon…
COMS 4995.010 APPLIED DEEP LEARNING | Gordon, Joshua
This is a DSI course therefore please refer to their website for the cross-registration instructions for NON-DS students
Course description coming soon…
COMS 4995.011 APPLIED MACHINE LEARNING | Pappu, Vijay
This is a DSI course therefore please refer to their website for the cross-registration instructions for NON-DS students
This class offers a hands-on approach to machine learning and data science. The class discusses the application of machine learning methods like SVMs, Random Forests, Gradient Boosting and neural networks on real world dataset, including data preparation, model selection and evaluation. This class complements COMS W4721 in that it relies entirely on available open source implementations in scikit-learn and tensor flow for all implementations. Apart from applying models, we will also discuss software development tools and practices relevant to productionizing machine learning models.
COMS 4995.012 ELEMENTS FOR DATA SCIENCE | Gibson, Bryan
This is a DSI course therefore please refer to their website for the cross-registration instructions for NON-DS students
This course is designed as an introduction to elements that constitutes the skill set of a data scientist. The course will focus on the utility of these elements in common tasks of a data scientist, rather than their theoretical formulation and properties. The course provides a foundation of basic theory and methodology with applied examples to analyze large engineering, business, and social data for data science problems. Hands-on experiments with R or Python will be emphasized.
COMS 4995.020 Mathematics of Machine Learning and Signal Recognition | Homayoon S Beigi
Mathematics of Machine Learning and Signal Recognition provides the background mathematical background for addressing in-depth problems in machine learning, as well as the treatment of signals, especially time-dependent signals, specifically non-stationary time-dependent signals – although spatial signals such as images are also considered. The course will provides the essentials of several mathematical disciplines which are used in the formulation and solution of the problems in the above fields. These disciplines include Linear Algebra and Numerical Methods, Complex Variable Theory,
Measure and Probability Theory (as well as statistics), Information Theory, Metrics and Divergences, Linear Ordinary and Separable Partial Differential Equations of Interest, Integral Transforms, Decision Theory, Transformations, Nonlinear Optimization Theory, and Neural Network Learning Theory. The requirements are Advanced Calculus and Linear Algebra. Knowledge of Differential Equations would be helpful.
COMS 4995.021 BLDNG SUCCESSFUL STARTUP | Apoorv Agarwal
This course will provide visibility into the journey of building a venture backed high-growth start-up (from inception to Series B).
First time founders make a number of mistakes that are easily avoidable. A typical mistake is to make assumptions about the market and market needs. The technology needs to work but a lot more is required to get a start-up off the ground — fundraising, sales, marketing, hiring, operations, and finance.
This course will provide recipes for avoiding failure and answer common questions such as — how to find and validate an idea, how to find a co-founder, how to find investors and raise funding, how to run board meetings, how to build a team (who and how to hire at what stage), how to compensate early employees and advisors, what’s a cap table and how to manage it, etc.
There will be a number of guest speakers ranging from successful CEOs to top venture capitalists. In addition to providing different perspectives, this will help students expand their professional networks.
COMS 6998.001 TOPICS IN ROBOTIC LEARNIN | Song, Shuran
This is an advanced seminar course that will focus on the latest research in machine learning for robotics. More specifically, we study how machine learning and data-driven method can influence the robot’s perception, planning, and control. For example, we will explore the problem of how a robot can learn to perceive and understand its 3D environment, how they can learn from experience to make reasonable plans, and how they can reliably act upon with the complex environment base on their understanding of the world. Students will read, present, and discuss the latest research papers on robot learning as well as obtain experience in developing a learning-based robotic system in the course projects.
COMS 6998.002 ADV WEB DESIGN STUDIO | Chilton, Lydia
COMS 6998.004 DIALOG SYSTEMS (CONVERSNL | Yu, Zhou
Course description coming soon…
COMS 6998.005 REPRESENTATION LEARNING | Vondrick, Carl
The course will discuss the latest research in representation learning. Each week, we will discuss a topic by reading papers in computer vision, NLP, machine learning, and robotics. Attendance is required and every student will be expected to speak during each class.
COMS 6998.006 ADV SPOKEN LANGUAGE PROCE | Hirschberg, Julia
Course Description: This class will introduce students to spoken language processing: basic concepts, analysis approaches, and applications. Applications include Text-to-Speech Synthesis, dialogue systems, and analysis of entrainment, personality, emotion, humor and sarcasm, deception and charisma
enrollment is via waitlist.
COMS 6998.007 TOPICS DATACENTER NETWORK | Misra, Vishal
Course description coming soon…
COMS 6998.008 ENGR WEB3 BLOCKCHAIN APPS | Yang, Junfeng
The potential applications for blockchains and cryptocurrencies are enormous. The course will cover the technical aspects of cryptocurrencies, blockchain technologies, and distributed consensus. Students will learn how these systems work and how to engineer secure software that interacts with a blockchain system like Bitcoin and Ethereum. For a list of the topics we will cover, please go to the course syllabus page. This course is intended for advanced undergraduates and graduate students.
COMS 6998.009 FUND SPEECH RECOGNITION | Beigi, Homayoon
Fundamentals of Speech Recognition is a comprehensive course, covering all aspects of automatic speech recognition from theory to practice. In this course such topics as Anatomy of Speech, Signal Representation, Phonetics and Phonology, Signal Processing and Feature Extraction, Probability Theory and Statistics, Information Theory, Metrics and Divergences, Decision Theory, Parameter Estimation, Clustering and Learning, Transformation, Hidden Markov Modeling, Language Modeling and Natural Language Processing, Search Techniques, Neural Networks, Support Vector Machines and other recent machine learning techniques used in speech recognition are covered in some detail. Also, several open source speech recognition software packages are introduced, with detailed hands-on projects using Kaldi to produce a fully functional speech recognition engine. The lectures cover the theoretical aspects as well as practical coding techniques. The course is graded based on a project. The Midterm (40% of the grade is in the form of a two page proposal for the project and the final (60% of the grade) is an oral presentation of the project plus a 6-page conference style paper describing the results of the research project. The instructor uses his own Textbook for the course, Homayoon Beigi, “”Fundamentals of Speaker Recognition,”” Springer-Verlag, New York, 2011. Every week, the slides of the lecture are made available to the students.
COMS 6998.010 FINE GRAINED COMPLEXITY | Alman, Josh
The theory of NP-hardness is able to distinguish between “easy” problems that have polynomial-time algorithms, and “hard” problems that are NP-hard. However, it is often not reasonable to label any problem with a polynomial-time algorithm as “easy”, and in practice, it can make a huge difference whether an algorithm runs in, say, linear time, quadratic time, or cubic time. Nonetheless, the theory of NP-hardness is “too coarse” to make distinctions like this.
Fine-Grained Complexity is a new area which aims to address this issue. The theory involves careful “fine-grained reductions” between problems which show that a polynomial speedup for one problem would give a polynomial speedup for another. Through a web of such reductions, Fine-Grained Complexity identifies a small number of algorithmic problems whose best-known algorithms are conjectured to be optimal, and shows that assuming these conjectures, a wide variety of algorithms in many seemingly-different areas must also be optimal.
In this class, we will explore this new area and a variety of applications. We will study problems like Orthogonal Vectors, 3-SUM, and All-Pairs Shortest Paths, which are at the center of these conjectures, see why we believe they are (or aren’t?) difficult. We will give fine-grained reductions to these problems from problems in many areas, including dynamic and approximation algorithms problems. We will also study the related area of parameterized complexity, which shows how NP-hard problems can become tractable when certain parameters describing the input are guaranteed to be small.
Evaluation will be primarily based on a final project. I will provide a list of suggestions for the project, but students are encouraged to relate their project to their interests. There will also be occasional (at most 3) homework assignments, and students will be asked to scribe a lecture.
(see course website for more details)
COMS 6998.011 NATURAL LANG GEN SUMMARIZ | McKeown, Kathleen
There has been tremendous progress recently in the development of models to generate language for different purposes and to produce summaries of input documents. This success has largely come about due to rapid advances in large language models. In this class, we will explore four main topics: language generation, multimodal generation, summarization and long-format question answering. We will study large language models that have been used for these different tasks and issues that arise. For example, how possible is to control the output of these large language models along different dimensions? How do we evaluate the textual output of such systems? How can we develop models to produce or summarize creative texts? What are the ethical issues surrounding these kinds of models?
This class will follow a seminar style. It will include a mixture of lectures from the instructor, guest lectures and student presentations. Students who take the class will have three main assignments. 1. For each class there will be a reading assignment consisting of several research papers. Students are responsible for reading all papers. 2. Students will be part of a presentation group which will be responsible for presenting a paper and raising critiques about one or more papers in class. 3. Each student will carry out a semester long project. Each student will be required to submit a proposal for the project near the beginning of class, turn in a midterm progress report and submit a final report and code for their project. They will do a short video presentation which will be made available to the class, the Tas and the instructor for viewing. There will be no midterm or final exam.
Pre-requisite: COMS 4705, Natural Language Processing or equivalent.
COMS 6998.012 PRACT DEEP LEARNING SYS P | Dube, Parijat
This course will cover several topics in performance evaluation of machine learning and deep learning systems. Major topics covered in the course: Algorithmic and system level introduction to Deep Learning (DL), DL training algorithms, network architectures, and best practices for performance optimization, ML/DL system stack on cloud, Tools and benchmarks (e.g., DAWNBench) for performance evaluation of ML/DL systems, Practical performance analysis using standard DL frameworks (tensorflow, pytorch) and resource monitoring tools (e.g., nvidia-smi), Performance modeling to characterize scalability with respect to workload and hardware, Performance consideration with special techniques like transfer learning, semi-supervised learning, neural architecture search. Emphasis will be on getting working knowledge of tools and techniques to evaluate performance of ML/DL systems on cloud platforms. The assignments will involve running experiments using standard DL frameworks (tensorflow, pytorch) and working with open source DL technologies. The students will gain practical experience working on different stages of DL life cycle (development and deployment) and understanding/addressing related system performance issues.
COMS 6998.013 FAIR AND ROBUST ALGORITHM | Zemel, Richard
Website | Special Enrollment Procedures
Standard learning approaches are designed to perform well on average for the data distribution available at training time. Developing robust learning approaches that are not overly sensitive to the training distribution is central to research on domain- or out-of-distribution generalization, and fairness. In this course we will focus on research on robust learning methods and on algorithmic fairness, and links between them. For example, while domain generalization methods typically rely on knowledge of disjoint “domains” or “environments”, “sensitive” label information indicating which demographic groups are at risk of discrimination is often used in the fairness literature. Algorithms that take this information into account turn out to have some deep relationships.
This course will survey foundational ideas, recent work, and applications in this area. Evaluation will be based mainly on a project involving original research by the students. Students should already be familiar with the basics of machine learning such as linear algebra, optimization, and probability.
The class will have a major project component.
COMS 6998.014 ANALYSIS OF NETWORKS & CR | Chaintreau, Augustin
This course covers the fundamentals underlying information diffusion and incentives on networked applications. Applications include but are not limited to social networks, crowdsourcing, online advertising, rankings, information networks like the world wide web, as well as areas where opinion formation and the aggregate behavior of groups of people play a critical role. Among structural concepts introduced and covered in class feature random graphs, small world, weak ties, structural balance, cluster modularity, preferential attachments, Nash equilibrium, Potential Game and Bipartite Graph Matching. The class examines the following dynamics: link prediction, network formation, adoption with network effect, spectral clustering and ranking, spread of epidemic, seeding, social learning, routing game, all-pay contest and truthful bidding
COMS 6998.015 CLOUD COMPUTING & BIG DAT | Sahu, Sambit
Cloud Computing and Big Data Systems
This is a graduate level course on Cloud Computing and Big Data with emphasis on hands-on design and implementations. You will learn to design and build extremely large scale systems and learn the underlying principles and building blocks in the design of such large scale applications. You will be using real Cloud platforms and services to learn the concepts, build such applications.
The first part of the course covers basic building blocks such as essential cloud services for web applications, cloud programming, virtualization, containers, kubernetes and micro-services. We shall learn these concepts by using and extending capabilities available in real clouds such as Amazon AWS, Google Cloud.
The second part of the course will focus on the various stacks used in building an extremely large scale system such as (i) Kafka for event logging and handing, (ii) Spark and Spark streaming for large scale compute, (iii) Elastic Search for extremely fast indexing and search, (iv) various noSQL database services such as DynamoDB, Cassandra, (v) cloud native with kubernets, (vi) cloud platforms for Machine Learning and Deep Learning based applications. Several real world applications will be covered to illustrate these concepts and research innovations.
Students are expected to participate in class discussions, read research papers, work on three programming assignments, and conduct a significant course project. Given that this is a very hands-on course, it is expected that students have decent programming background.
Prerequisite: Good programming experience in any language, Concepts of Web Applications and Systems
Reading Material: Lecture Notes, Reading Papers, Reference Text Books, Lot of Engineering Docs
Grading: 3 Programming Assignments (35%), 2 Quizzes (25%), Project (40%)
COMS 6998.016 MACHINE LEARNING &CLIMATE | Kucukelbir, Alp
In this course, we will study two aspects of how ml interacts with Earth’s climate.
First, we will investigate how ml can be used to tackle climate change. We will focus on use cases from transportation, manufacturing, food and agriculture, waste management, and atmospheric studies. We will ask questions like: what are the requirements for applying ml to such problems? How can we evaluate the effectiveness of our analyses?
Second, we will consider ml’s own impact on the climate. We will focus on the energy and computation that goes into designing, training, and deploying modern ml systems. We will ask questions like: how can we accurately track and account for ml’s own energy footprint? What strategies can we employ to minimize it?
By the end of this course, you will learn about modern statistical and causal ml methods and their applications to the climate. Our focus will be the modeling of real-world phenomena using probability models, with a focus on vision, time series forecasting, uncertainty quantification, and causality. In addition, you will gain a deeper understanding about the carbon footprint of ml itself and explore how to mitigate it
COMS 6998.017 HIGH PERF MACH LEARNING | Parijat Dube and Kaoutar El-Maghraoui
During the past decades, the field of High-Performance Computing (HPC) has been about building supercomputers to solve some of the biggest challenges in science. HPC is where cutting edge technology (GPUs, low latency interconnects, etc.) is applied to the solution of scientific and data-driven problems.
One of the key ingredients to the current success of ML is the ability to perform computations on very large amounts of training data. Today, the application of HPC techniques to ML algorithms is a fundamental driver for the progress of Artificial Intelligence. In this course, you will learn HPC techniques that are typically applied to supercomputing software, and how they are applied to obtain the maximum performance out of ML algorithms. You will also learn about techniques for building efficient ML systems. The course is based on PyTorch, CUDA programming, MPI.
Prerequisites
Knowledge of computer architecture and operating system through course work; C/C++: intermediate programming skills; Python: intermediate programming skills; Knowledge of machine learning concepts and deep learning algorithms through course work
Topics covered
ML/DL and PyTorch basics, PyTorch performance. Performance optimization in Pytorch, Parallel performance modeling, Intro to CUDA and CUDA programming, Math libraries for ML (cuDNN), DNNs architectures (CNN, RNN, LSTM, Attention, Transformers) in Pytorch, Intro to MPI, Distributed ML, Distributed PyTorch algorithms, parallel data loading, and ring reduction, Quantization and model compression, hw/sw co-design and co-optimization of DNNs
COMS 6998.018 READINGS LANGUAGE DESIGN | Bjarne Stroustrup
This course involves reading, presenting, and discussing papers on programming language design. In the first lecture, I outline some dimensions of language design and give examples of aims and ideals that have driven the development of languages. I also present a (necessarily incomplete) list of classical design papers. In the second lecture, I present a paper for discussion to give an idea of how that may be done. This is not a comparative languages class nor a type-theory class. The focus on presentations should be on ideals, aims, and principles so that language features are presented simply as examples. For older languages it is possible and desirable to discuss how the ideas worked out in actual use and evolved. Uncritical “sales jobs” of languages – as are not uncommon in early descriptions of new languages – are not acceptable as presentations. You might, of course, choose to present your favorite language, but maintain a critical and objective perspective.