Algorithms for Massive Data (COMS E6998)

Class Description

See the official course information.

Modern Data presents both a big promise but also a big challenge --- how are we to extract that promise? The classic algorithms for processing data are often insufficient to deal with the datasets of modern sizes. For example, a quadratic-time algorithm means that 10x increase in data size requires a 100x increase in resources!

This class will focus on algorithmic techniques to tackle massive datasets efficiently. We will see computational models for how to think of massive data processing, and explore core algorithmic ideas, such as how to summarize complex data via sampling and small sketches techniques.


Self-Evaluation test: you must complete the self-evaluation test asap (ideally before the class starts) to confirm that you have the sufficient background for the class, and identify potential parts to brush up before the class. It is mandatory, but you do not have to turn in solutions. After you are done with it, take a look at the solutions.

Mathematical maturity is a must: the class is based on theoretical ideas and is proof-heavy. You are expected to be able to read and write formal mathematical proofs. Some familiarity with algorithms and randomness will be assumed as well. COMS 4231 (Analysis of Algorithms) or equivalent is useful, but not required if you have solid math background.

Undergradute students and students from other departments are welcome.

(Tentative) List of Topics