# COMS 4774 Spring 2021 (Unsupervised Learning)

## Lecture & reading schedule

Instructions for connecting to the “Write” whiteboard are available in the introductory lecture slides.

**High-dimensional data**

- concentration of measure (1/12, 1/14, 1/19)
- random linear maps (1/19, 1/21)
- high-dimensional distributions (1/21, 1/26, 1/28, 2/2)
- subspace embeddings and fast JLT (2/4, 2/9)

**Low-rank approximations**

- singular value decomposition (2/11, 2/16, 2/18)
- sums of random matrices (2/23, 2/25)
- planted partition models (3/9, 3/11)
- spectral graph theory (3/11, 3/16, 3/18, 3/23)
- semi-supervised learning (3/23)

**Higher-order interactions**

- higher-order moments (3/25)
- multivariate moment tensors (3/30, 4/1)
- tensor decompositions (4/1, 4/6, 4/8)
- tensor PCA (4/13)

**Extra**

- self-supervised learning (4/15)

## Syllabus

### Course description

COMS 4774 is a graduate-level introduction to unsupervised machine learning. This class covers classical and modern algorithmic techniques for problems in machine learning beyond traditional supervised learning, including fitting statistical models, dimension reduction, and exploratory data analysis. **This class will emphasize the theoretical analysis of algorithms used for these tasks.**

I previously taught this course material as COMS 4772 (“Advanced Machine Learning”). Nakul Verma teaches COMS 4774 in other semesters with a slightly different slate of topics.

### Prerequisites

You must know multivariate calculus, linear algebra, basic probability, and discrete mathematics. You must be familiar with basic algorithmic design and analysis. You must have general mathematical maturity and be comfortable **reading and writing mathematical proofs**.

COMS 4771 is not a prerequisite, but it is recommended. The mathematical prerequisite topics for COMS 4771 will be assumed.

The “math refresher” assignment from a previous instantiation of the course should give you an idea of what will be expected.

#### Online resources for course prerequisites

If you are unsure about whether you satisfy the prerequisites for this course (or would like to “page-in” this knowledge), please check the following links.

*Multivariate calculus*: videos by 3Blue1Brown; textbook by Marsden and Weinstein; MIT open courseware.
*Linear algebra*: videos by 3Blue1Brown; immersive linear algebra; lecture notes from UC Davis; MIT open courseware; book chapter by Goodfellow, Bengio, and Courville; additional review notes
*Probability*: textbook by Grinstead and Snell; book chapter by Goodfellow, Bengio, and Courville; review notes by Arian Maleki and Tom Do
*Algorithms*: Chapter 0 of textbook by Dasgupta, Papadimitriou, and Vazirani (with discussion of asymptotic notation); “booksite” by Sedgewick and Wayne
*Discrete math*: MIT open courseware; lecture on graph theory
*Mathematical maturity*: guidelines for good mathematical writing, from HMC; notes on writing mathematics well, from HMC; notes on writing math in paragraph style, from SJSU; notes on writing proofs, from SJSU; how to write a math solution, from AoPS

### Course content

#### Topics

*Next, I will explain eigenvectors. These are just vectors, and we all know what vectors are—they’re things that go someplace, right? So you take regular vectors and make them eigen, and you get eigenvectors. Now let’s tackle dimensionality reduction. That simply means that you take a certain dimensionality and then you reduce it. So—are we good? Good!* – Ian Frazier, “It’s the Data, Dolts”

This list of topics is tentative and subject to change.

- High-dimensional data
- Concentration of measure
- Random linear maps
- High-dimensional Gaussians
- Subspace embeddings

- Low-rank approximations
- Singular value decomposition
- Sums of random matrices
- Planted partition models
- Spectral graph theory
- Semi-supervised learning

- Higher-order interactions
- Higher-order moments
- Multivariate moment tensors
- Tensor decompositions

#### Readings

Readings will be assigned from various sources, including the following text:

#### Assignments

The overall course grade is comprised of:

- homework assignments (35%)
- final project (35%)
- class participation (30%)

Please submit all assignments by the specified due dates. Extensions are generally only granted for medical reasons. (Please ask your academic advisor to confirm documentation from a physician / medical practitioner, and then ask them to email me their confirmation.)

All written assignments should be neatly typeset as PDF documents. This will make grading much easier! You can use LaTeX, Microsoft Word, or any other system that produces high-quality PDFs with neatly typeset equations and mathematics. If you have not used LaTeX before, or if you only have a passing familiarity with it, it is recommended that you read and complete the lessons and exercises in The Bates LaTeX Manual or on learnlatex.org. This video by Ryan O’Donnell on writing math in LaTeX is also recommended.

### Disability services

If you require accommodations or support services from Disability Services, please make necessary arrangements in accordance with their policies within the first two weeks of the semester.

### Academic rules of conduct

You are expected to adhere to the Academic Honesty policy of the Computer Science Department, as well as the following course-specific policies.

#### Collaboration on homework assignments

You are welcome and encouraged to discuss homework assignments with fellow students. Your discussions should respect the following rules.

- Homework assignments should be completed
**individually** or in **groups of two students (i.e., pairs)**.
- We will provide instructions for submitting assignments as a group.
- Each group member must contribute to every part of the assignment; no one should be just “along for the ride”.
- Each group member must take responsibility for the
*entire* submitted write-up.
- The submitted write-up should be completely in your own words. If you need to quote or reference a source, you must include proper citations in your write-up.

- Discussion
*between* groups may include brainstorming and verbally discussing possible solution approaches, but **must not go as far as one person telling another how to solve the problem**.
- You may not take any notes (whether handwritten or typeset) from the discussions.
- Any written/electronic discussions (e.g., over messaging platforms, email) should be discarded/deleted immediately after they take place.

- You may not look at another group’s homework write-up/solutions (whether partial or complete).
- You may not show your homework write-up/solutions (whether partial or complete) to another group.

Note that you are not required to work on homework assignments in groups. In fact, I generally think it is better to work on homework assignments individually. However, this semester, I do encourage working in groups, as the COVID-19 situation may make it difficult to otherwise interact with fellow classmates.

#### Use of outside references on homework assignments

Outside reference materials and sources (i.e., texts and sources beyond the assigned reading materials for the course) may be used on homework *only if given explicit written permission from the instructor* and if the following rules are followed.

- Any outside reference must be acknowledged and cited in the write-up.
- Sources obtained by searching the literature/internet for answers or hints on homework assignments are
*never permitted*.
- You are permitted to use texts and sources on course prerequisites (e.g., a linear algebra textbook).
- If you need to look up a result in such a source, provide a citation in your homework write-up.

- If you
*inadvertently* come across a solution to (or substantial hint about) a problem, you must:
- acknowledge this source and document the circumstance in your homework write-up;
- produce a solution without looking at the source; and
- as always, write your solution in your own words.

- If you have already seen one of the homework problems before (e.g., in a different course), please re-solve the problem without referring to any previous solutions.
- In your write-up, please also indicate that you had seen the problem before. (You won’t lose any credit for this; it would just be helpful for us to know about this fact.)

#### Violations

Violation of any portion of these policies will result in a penalty to be assessed at the instructor’s discretion (e.g., a zero grade for the assignment in question, a failing letter grade for the course). All violations are reported to Student Conduct and Community Standards.

### Getting help

You are encouraged to use office hours and Piazza to discuss and ask questions about course material and reading assignments, and to ask for high-level clarification on and possible approaches to homework problems. If you need to ask a detailed question specific to your solution, please do so on Piazza and mark the post as “private” so only the instructors can see it.

Questions, of course, are also welcome during lecture. If something is not clear to you during lecture, there is a chance it may also not be clear to other students. So please raise your hand to ask for clarification during lecture. Some questions may need to be handled “off-line”; we’ll do our best to handle these questions in office hours or on Piazza.

When asking questions on Piazza or in office hours, please be as specific as possible and give all of the relevant context. Questions like “can you explain X” and “how do I solve Y” are not questions that we can usefully answer on Piazza or in office hours. We will have a better chance of providing a useful answer to more specific questions that are accompanied with relevant context: e.g., “It seems to me that Theorems X and Y from last week’s lecture (discussed in textbook Z) have contradicting conclusions. I believe Theorem X applies in the following premise […], but applying Theorem Y to the same premise gives an opposite conclusion. Why does Theorem Y not apply?”

## Note about enrollment

Enrollment for this course is managed by the CS front office by putting everyone on the waitlist initially and then admitting students into the class manually (but not by me). Please contact CS student services (advising@cs or gradvising@cs, depending on whether you are an undergraduate or graduate student) for information about the waitlist.

**More information from the registrar:**

The official Change of Program Period (course shopping period) begins on Monday, January 11, and ends on Friday, January 22.

Canvas course sites will be set to be accessible to anyone with a Columbia UNI and password so that all students can access the Zoom class meeting links.

The Zoom class meeting links should be available in Courseworks under “Zoom Class Sessions”.

**About auditing:**

A few folks have asked about auditing. If you would like to audit, use your browser to bookmark the Zoom links for the lectures. However, *auditors should not submit assignments, attend office hours, post on Piazza, etc.*; this is because the time/effort of the course staff is budgeted only for the enrolled students.