COMS 6998: Sublinear Time Algorithms in Learning and Property Testing
Spring 2014

SYLLABUS

General Information | Introduction | Prerequisites | Syllabus | Grading | Readings | Problems | Final Projects | Scribe Notes | Academic Honesty Policy

General Information

Instructor: Rocco Servedio. Office hours: Fri 9:30-11:30, CSB 517.
Teaching assistant: Clement Canonne (OH: Tues 2-4, CSB 516)
Classroom: 644 Mudd
Time: Wed 4:10-6pm.
Course website (this page): http://www.cs.columbia.edu/~rocco/Teaching/S14/
Course email (use for all course-related issues): coms6998learntests14 at gmail dot com.

Administrative Note: This course can be used as a theory elective for the Ph.D. program in computer science, as a track elective course for MS students in the "Foundations of Computer Science" track or the "Machine Learning" track, or as a track elective for undergraduates in the "Foundations" track. Auditing the class requires the approval of the instructor.

Introduction

Recently there has been a lot of glorious hullabaloo about Big Data and how it is going to revolutionize the way we work, play, eat and sleep. As part of the general excitement, it has become clear that for truly massive datasets and digital objects of various sorts, even algorithms which run in linear time (linear in the size of the relevant dataset or object) may be much too slow.

In this course we will study sub-linear time algorithms which are aimed at helping us understand massive objects. The algorithms we study inspect only a tiny portion of an unknown object and have the goal of coming up with some useful information about the object. Algorithms of this sort provide a foundation for principled analysis of truly massive data sets and data objects.

In particular we will consider

We will study these algorithms for three different types of objects: Boolean functions, graphs, and probability distributions.

Prerequisites

There are no specific prerequisities for the course, but you should be mathematically mature and comfortable with the usual style of theoretical computer science arguments and analyses. Prior graduate level coursework in theoretical computer science will be helpful, but the course should be accessible to mathematically strong undergraduates as well. It may be helpful to have taken COMS 4252 (the introductory learning theory course), but this is not a requirement -- this course has been designed to be accessible without COMS 4252.

In a bit more detail, we will freely use standard tail bounds for sums of independent random variables (i.e. Chernoff bounds and Hoeffding bounds) so it may be helpful to come into the course familiar with these basic tools. Here is a quick summary of these results that has been helpfully prepared by Clement.

Rough Syllabus

Here is a rough anticipated syllabus for the course. New topics may be added in and some of the topics listed below may not be covered depending on how the semester goes. The majority of the course will be lectures given by the instructor on the topics below. Pointers to relevant readings are given below. The last few lectures will be presentations given by the students as part of their final projects.

Introduction:

Boolean Functions: Graphs: Distributions:

Grading and Requirements

The requirements of the course are as follows:

Readings

There is no required textbook for the course. Pointers to required and recommended readings (papers and occasional book chapters) will be posted on the ``Syllabus'' portion of this webpage above as the semester progresses.

Problems

Here is a list of the "official problems".

Final Projects

Click here for more information about final projects.

Scribe Notes

Click here to obtain scribe notes from the various lectures.

Academic Honesty Policy

Students are expected to adhere to the Academic Honesty policy of the Computer Science Department; this policy can be found in full here. Please contact the instructor with any questions.