**Dates**: August 29-31, 2018**Venue**: CS Lounge in Computer Science Building, Columbia University

The goal of the Columbia TRIPODS Bootcamp Lectures is to introduce students to the computational, mathematical, and statistical foundations of data science, and to present an overview of the research being pursued at the Columbia TRIPODS Institute.

The lectures will introduce topics at an introductory level, so prior exposure to the lecture topics is not necessary. An undergraduate-level background in a mathematical subject (e.g., theoretical computer science, mathematics, statistics) will be assumed. In particular, we’ll assume a basic knowledge of linear algebra and probability theory; familiarity with large deviation bounds (e.g., Chernoff bounds) will be helpful.

The lectures are open to all, but we kindly request that you complete the following registration form so we get an accurate headcount.

- Registration form: https://goo.gl/forms/WmAyJQuCvGTFau4i1

Wednesday, August 29, 2018

- 9:30 am - 10:30 am: Optimization (John Wright)
- 10:30 am - 11:00 am: Break
- 11:00 am - 12:00 pm: Optimization (John Wright)
- 12:00 pm - 2:00 pm: Lunch break
- 2:00 pm - 3:00 pm: Statistical learning theory (Daniel Hsu)
- 3:00 pm - 3:30 pm: Break
- 3:30 pm - 4:30 pm: Statistical learning theory (Daniel Hsu)

Thursday, August 30, 2018

- 9:30 am - 10:30 am: Optimization (John Wright)
- 10:30 am - 11:00 am: Break
- 11:00 am - 12:00 pm: Statistical learning theory (Daniel Hsu)
- 12:00 pm - 2:00 pm: Lunch break
- 2:00 pm - 3:00 pm: Active learning (Chris Tosh)
- 3:00 pm - 3:30 pm: Break
- 3:30 pm - 4:30 pm: Sublinear algorithmic tools (Alex Andoni)

Friday, August 31, 2018

- 9:30 am - 10:30 am: Active learning (Chris Tosh)
- 10:30 am - 11:00 am: Break
- 11:00 am - 12:00 pm: Sublinear algorithmic tools (Alex Andoni)
- 12:00 pm - 2:00 pm: Lunch break
- 2:00 pm - 3:00 pm: Sublinear algorithmic tools (Alex Andoni)
- 3:00 pm - 3:30 pm: Non-stationary streaming PCA (Apurv Shukla)

Each day, we’ll have coffee & bagels in the mornings at 9:15 am, and also have some snacks available just before the afternoon lectures begin.

TBD

Starting with the classic dimension reduction method, researchers developed powerful tools for storing, communicating, and accessing data pieces more efficiently than merely storing/etc the full data. These tools, often studied in the area sublinear algorithms (e.g., sketching/streaming), are a form of functional compression, where we store just enough about data pieces to be useful for particular tasks. Most importantly, these tools have led to new algorithms with much better computational efficiency.

Statistical learning theory provides a rigorous statistical framework in which to study machine learning algorithms. These lectures will introduce the framework and the theoretical tools that have been used to analyze some important machine learning algorithms.

In many scenarios where a classifier is to be learned, it is easy to collect unlabeled data but costly to acquire labels. This has motivated the study of active learning, in which a learner is allowed to adaptively query data points for their labels with the objective of finding a low-error classifier with as few queries as possible. In these lectures, we will examine several approaches to active learning with a focus on their theoretical guarantees.

The Columbia TRIPODS Institute aims to articulate methodological foundations for data science, spanning mathematics, statistics, and computing. Our emphasis is on foundations to support practice, through the analysis of successful heuristics, the development of well-structured computational toolkits, and the development of theory to support the entire cycle of data science. See the Columbia TRIPODS website for more information.

We thank the National Science Foundation for financial support through the award CCF 1740833, and Elaine Roth from the Columbia CS department for administrative support.