Spring 2022 Topics Courses

The below are course descriptions for the Spring 2021 Topics Courses. This page will be updated as new information is made available. The official record of what will be offered is listed on the Directory of Classes. Please use this only as a resource in your course planning. Undergraduates should consult your CS Faculty advisor to see if a course counts for your track. MS students should consult the Topics page (if not listed then consult your CS Faculty advisor).

For questions regarding Data Science courses please email: DataScience-Registration@columbia.edu

 

COMS 4995.001 SEMANTIC REPRESENTATIONS | Bauer, Daniel
COMS 4995.002 UBIQUITOUS SEQUENCING | Pe’er, Itshack
COMS 4995.003 CAUSAL INFERENCE II | Bareinboim, Elias
COMS 4995.004 GEOMETRIC DATA ANALYSIS | Blumberg, Andrew
COMS 4995.005 DESIGN USING C++ | Stroustrup, Bjarne
COMS 4995.007 INTRO TO NETWORKS AND CRO | Chaintreau, Augustin
COMS 4995.008 RANDOMIZED ALGORITHMS | Roughgarden, Timothy
COMS 4995.009 NATUR ARTIFIC NEURAL NETW | Papadimitriou, Christos
COMS 4995.010 NAT ART NEURAL NETWORKS L | Papadimitriou, Christos
COMS 6998.001 ADV SPOKEN LANGUAGE PROCE | Hirschberg, Julia
COMS 6998.002 AI FOR SOFTWARE ENGINEERI | Ray, Baishakhi
COMS 6998.003 FUND SPEECH RECOGNITION | Beigi, Homayoon
COMS 6998.004 COMMUNICATION COMPLEXITY | Pitassi, Toniann
COMS 6998.005 SELF SUPERVISED LEARNING | Zemel, Richard
COMS 6998.006 QUANTUM COMPLEXITY CRYPTO | Yuen, Henry
COMS 6998.007 ADV TOPICS PROJ DEEP LEAR | Belhumeur, Peter
COMS 6998.008 CLOUD COMPUTING & BIG DAT | Sahu, Sambit
COMS 6998.009 PRACT DEEP LEARNING SYS P | Dube, Parijat
COMS 6998.010 MACHINE LEARNING & CLIMATE | Kucukelbir, Alp


 

 

 



COMS 4995.1 SEMANTIC REPRESENTATIONS | Bauer, Daniel

Syllabus

Many NLP tasks and applications require some level of understanding of the semantics (meaning) of linguistic expressions. The question of how to represent semantics and how to map between surface linguistic expressions to semantic representations is therefore an important part of NLP research. In this course, we will compare a variety of approaches to representing and inferring the meaning of words and sentences, including symbolic/logic-based representations and resources, as well as contemporary distributional and multi-modal approaches. Requirements include homework assignments, a paper presentation or summary paper, and a final project.
Prerequisites: Data Structures COMS 3134 or COMS 3137, and Discrete Math COMS 3203. Some experience with machine learning and neural network models. Some background in programming in Python is beneficial. COMS 4705 – Natural Language Processing strongly recommended.




COMS 4995.002 UBIQUITOUS SEQUENCING | Pe’er, Itshack

DNA sequences are life’s information system. The recent technological revolution allowing ubiquitous collection of such sequence data opened up applications from viral identification to molecular barcoding to forensics to efficient long-term data storage and more. The class will present background to the available technologies along with an introduction to what can be gleaned from the data they produce. The class will include a significant hands-on component in hackathon format. The class is open to an interdisciplinary community of undergraduates and graduate students who will mutually benefit one another with their disciplinary skills. Specifically, students with zero background in biology are welcome. Computing background is needed at a very basic level (Intro CS, or Computing in Context, or AP CS or instructor’s approval). A detailed description of a previous run of the class is available in https://elifesciences.org/articles/14258

 




COMS 4995.003 CAUSAL INFERENCE II | Bareinboim, Elias

FALL 2021 SYLLABUS | 




COMS 4995.004 GEOMETRIC DATA ANALYSIS | Blumberg, Andrew

Website | Syllabus

The goal of this class is to introduce approaches to analyzing data presented as finite metric spaces using ideas from algebraic topology and differential geometry. Prerequisites are a grounding in basic probability, statistics, and linear algebra. The class will focus on rigorous mathematical foundations and applications drawn from computational genomics.




COMS 4995.005 DESIGN USING C++ | Stroustrup, Bjarne

Link to full description

Informal description
Design cannot be understood in the abstract: To discuss design you need concrete examples – preferably examples of both good and bad design. Conversely, you cannot understand a programming language or library – or use if well – by just learning the rules for its individual features. You need to understand the general design ideas behind the language or library: Its philosophy. The ISO C++ language and its standard library provide many concrete examples for the discussion of design. We will look at C++ from its earliest days through the current 2020 ISO standard (C++20). This year’s version of this course will place some emphasis on the C++ Core Guidelines effort to provide tool-and-library supported guidelines for a modern style of C++ providing type and resource without loss of generality or performance.

This course involves a fair bit of reading, some programming, and some writing. Specific topics will be chosen from resource management (e.g., constructors and destructors), error handling (e.g., exceptions), generic programming (e.g. templates and concepts), compile-time computation, modularity, concurrency (threads and coroutines) and libraries (e.g. containers, algorithms, ranges, and smart pointers). Topics will be examined from various points of view, including usability, implementation models, teachability, performance, and real-world constraints…(read more)

Course Description and Prerequisites
This course explores the interactions among language design, library design, and program design in the context of ISO standard C++. Features provided from early C++ to C++20 and the design and programming techniques they support are featured.

Requirements: Senior undergraduate, masters, professional, PhD graduate standing. A basic understanding of C++ and experience with a software development project (in any language) would be an advantage.

 



COMS 4995.007 INTRO TO NETWORKS AND CRO | Chaintreau, Augustin

This course covers the fundamentals underlying information diffusion and incentives on networked applications. Applications include but are not limited to social networks, crowdsourcing, online advertising, rankings, information networks like the world wide web, as well as areas where opinion formation and the aggregate behavior of groups of people play a critical role. Among structural concepts introduced and covered in class feature random graphs, small world, weak ties, structural balance, cluster modularity, preferential attachments, Nash equilibrium, Potential Game and Bipartite Graph Matching. The class examines the following dynamics: link prediction, network formation, adoption with network effect, spectral clustering and ranking, spread of epidemic, seeding, social learning, routing game, all-pay contest and truthful bidding

 



COMS 4995.008 RANDOMIZED ALGORITHMS | Roughgarden, Timothy

Fall 2019 course website

 

 

COMS 4995.009 NATUR ARTIFIC NEURAL NETW | Papadimitriou, Christos

Syllabus



COMS 4995.010 NAT ART NEURAL NETWORKS L | Papadimitriou, Christos

Syllabus

 



COMS 6998.001 ADV SPOKEN LANGUAGE PROCE | Hirschberg, Julia

Course Website

This class will introduce students to spoken language processing:  basic concepts, analysis approaches, and applications.



COMS 6998.002 AI FOR SOFTWARE ENGINEERI | Ray, Baishakhi

description coming soon…



COMS 6998.003 FUND SPEECH RECOGNITION | Beigi, Homayoon

Fundamentals of Speech Recognition is a comprehensive course, covering all aspects of automatic speech recognition from theory to practice. In this course such topics as Anatomy of Speech, Signal Representation, Phonetics and Phonology, Signal Processing and Feature Extraction, Probability Theory and Statistics, Information Theory, Metrics and Divergences, Decision Theory, Parameter Estimation, Clustering and Learning, Transformation, Hidden Markov Modeling, Language Modeling and Natural Language Processing, Search Techniques, Neural Networks, Support Vector Machines and other recent machine learning techniques used in speech recognition are covered in some detail. Also, several open source speech recognition software packages are introduced, with detailed hands-on projects using Kaldi to produce a fully functional speech recognition engine. The lectures cover the theoretical aspects as well as practical coding techniques. The course is graded based on a project. The Midterm (40% of the grade is in the form of a two page proposal for the project and the final (60% of the grade) is an oral presentation of the project plus a 6-page conference style paper describing the results of the research project. The instructor uses his own Textbook for the course, Homayoon Beigi, “”Fundamentals of Speaker Recognition,”” Springer-Verlag, New York, 2011. Every week, the slides of the lecture are made available to the students.

 

 


COMS 6998.004 COMMUNICATION COMPLEXITY | Pitassi, Toniann

description coming soon…

 


COMS 6998.005 SELF SUPERVISED LEARNING | Zemel, Richard

description coming soon…




COMS 6998.006 QUANTUM COMPLEXITY CRYPTO | Yuen, Henry

Course Website

This is an advanced, PhD-level topics class in the theory of quantum computing, focusing in particular on cutting-edge topics in quantum complexity theory and quantum cryptography. The theme of the topics this year will be

Complexity of quantum states and state transformations.

An n-qubit quantum state appears to be exponentially more complex than an n-bit string; a classical description of such a state requires writing down 2n complex amplitudes in general. Here are some questions that will motivate our explorations:
  • How does this complexity manifest itself in different information-processing tasks?
  • How can we quantify this complexity?
  • Can this complexity be used for cryptographic applications?
  • Does the complexity of quantum states relate to traditional concepts in complexity theory such as circuit complexity or space complexity or time complexity?

A list of possible topics:

  • Shadow tomography of quantum states
  • Quantum state synthesis/unitary synthesis
  • Hamiltonian learning
  • Quantum key distribution/quantum money
  • QMA(2) and the power of unentanglement
  • State complexity, AdS/CFT, and quantum gravity
  • Quantum state complexity and statistical zero knowledge
  • Pseudorandom states and unitaries
  • Classical vs. quantum proofs

The following notes from Scott Aaronson will be frequently consulted: https://www.scottaaronson.com/barbados-2016.pdf.

The goal is to explore the results and (more importantly) the questions in this field. Course work may include: scribe notes, a few assignments, and a course project.

Prerequisites

Must have taken: Basic linear algebra and basic probability theory.

You must have taken at least one of:

  • An upper-level CS theory course such as complexity theory/cryptography/algorithms, or
  • Quantum computing, or
  • Quantum physics.

It is not required that you have taken a course in quantum computing/quantum information before, but it would definitely be helpful. The lectures will introduce the concepts as needed, but will go rather quickly.

The course will expect high levels of mathematical maturity and comfort with proofs. To get a sense of what this course is like, take a look at the lecture notes from this Fall 2020 topics course.

 




COMS 6998.007 ADV TOPICS PROJ DEEP LEAR | Belhumeur, Peter

Course description: 

 
Syllabus: 

This is a seminar course in which the students read, present, and discuss research papers on deep learning. The focus will be mostly on applications in computer vision, but topics in natural language processing, language translation, and speech recognition will also be read and discussed. It is expected that students taking the course will have prior experience with deep learning and neural network architectures. There will be no assignments other than reading and presentations. However, there will be a final project that students will work on throughout the duration of the semester. Enrollment is capped at 25 students. Instructor permission is required to register.

 



COMS 6998.008 CLOUD COMPUTING & BIG DAT | Sahu, Sambit

Cloud Computing and Big Data Systems
This is a graduate level course on Cloud Computing and Big Data with emphasis on hands-on design and implementations. You will learn to design and build extremely large scale systems and learn the underlying principles and building blocks in the design of such large scale applications. You will be using real Cloud platforms and services to learn the concepts, build such applications.

The first part of the course covers basic building blocks such as essential cloud services for web applications, cloud programming, virtualization, containers, kubernetes and micro-services. We shall learn these concepts by using and extending capabilities available in real clouds such as Amazon AWS, Google Cloud.

The second part of the course will focus on the various stacks used in building an extremely large scale system such as (i) Kafka for event logging and handing, (ii) Spark and Spark streaming for large scale compute, (iii) Elastic Search for extremely fast indexing and search, (iv) various noSQL database services such as DynamoDB, Cassandra, (v) cloud native with kubernets, (vi) cloud platforms for Machine Learning and Deep Learning based applications. Several real world applications will be covered to illustrate these concepts and research innovations.

Students are expected to participate in class discussions, read research papers, work on three programming assignments, and conduct a significant course project. Given that this is a very hands-on course, it is expected that students have decent programming background.

Prerequisite: Good programming experience in any language, Concepts of Web Applications and Systems

Reading Material: Lecture Notes, Reading Papers, Reference Text Books, Lot of Engineering Docs

Grading: 3 Programming Assignments (35%), 2 Quizzes (25%), Project (40%)

 




COMS 6998.009 PRACT DEEP LEARNING SYS P | Dube, Parijat

This course will cover several topics in performance evaluation of machine learning and deep learning systems. Major topics covered in the course: Algorithmic and system level introduction to Deep Learning (DL), DL training algorithms, network architectures, and best practices for performance optimization, ML/DL system stack on cloud, Tools and benchmarks (e.g., DAWNBench) for performance evaluation of ML/DL systems, Practical performance analysis using standard DL frameworks (tensorflow, pytorch) and resource monitoring tools (e.g., nvidia-smi), Performance modeling to characterize scalability with respect to workload and hardware, Performance consideration with special techniques like transfer learning, semi-supervised learning, neural architecture search. Emphasis will be on getting working knowledge of tools and techniques to evaluate performance of ML/DL systems on cloud platforms. The assignments will involve running experiments using standard DL frameworks (tensorflow, pytorch) and working with open source DL technologies. The students will gain practical experience working on different stages of DL life cycle (development and deployment) and understanding/addressing related system performance issues.




COMS 6998.010 MACHINE LEARNING &CLIMATE | Kucukelbir, Alp

Course Website | Syllabus PDF

In this course, we will study two aspects of how ml interacts with Earth’s climate.
First, we will investigate how ml can be used to tackle climate change. We will focus on use cases from transportation, manufacturing, food and agriculture, waste management, and atmospheric studies. We will ask questions like: what are the requirements for applying ml to such problems? How can we evaluate the effectiveness of our analyses?

Second, we will consider ml’s own impact on the climate. We will focus on the energy and computation that goes into designing, training, and deploying modern ml systems. We will ask questions like: how can we accurately track and account for ml’s own energy footprint? What strategies can we employ to minimize it?

By the end of this course, you will learn about modern statistical and causal ml methods and their applications to the climate. Our focus will be the modeling of real-world phenomena using probability models, with a focus on vision, time series forecasting, uncertainty quantification, and causality. In addition, you will gain a deeper understanding about the carbon footprint of ml itself and explore how to mitigate it