## Algorithms for Massive Data (COMS E6998-9)

- Instructor: Alexandr Andoni
- TAs: Marshall Ball, Peilin Zhong
- Time: TT 2:40-3:55pm
- Room: Mudd 825
- Office hours: Mon 3:30-5:30pm Alex in Mudd 420, Tue
10am-12pm Peilin in TA room, Wed 3-5pm Marshall in Mudd 516

#### Class Description

**See the official course information.**

Modern Data presents both a big promise but also a big challenge --- how are we to extract that promise? The classic algorithms for processing data are often insufficient to deal with the datasets of modern sizes. For example, a quadratic-time algorithm means that 10x increase in data size requires a 100x increase in resources!

This class will focus on algorithmic techniques to tackle massive datasets efficiently. We will see computational models for how to think of massive data processing, and explore core algorithmic ideas, such as how to summarize complex data via sampling and small sketches techniques.

#### Prerequisites

Self-Evaluation test:
you must complete the self-evaluation test asap (ideally before
the class starts) to confirm that you have the sufficient
background for the class, and identify potential parts to brush
up before the class. It is mandatory, but you do not have to
turn in solutions. After you are done with it, take a look at
the solutions.

Mathematical maturity is a must: the class is based on theoretical ideas and is proof-heavy. You are expected to be able to read and write formal mathematical proofs. Some familiarity with algorithms and randomness will be assumed as well. COMS 4231 (Analysis of Algorithms) or equivalent is useful, but not required if you have solid math background.

Undergradute students and students from other departments are welcome.

#### (Tentative) List of Topics

- Sketching/streaming algorithms
- Dimension reduction
- Numerical linear algebra
- Compressed sensing
- Streaming for graphs
- Distribution testing
- Sublinear-time algorithms/property testing
- Parallel computation