For Fall 2017 the usual time for the meetings will be Fridays at 12:00 in the CS Conference Room. Abstracts for talks are given below the talk schedule.

Yuchen Zhang

Stanford University

**Abstract:**
Learning state-of-the-art models in AI often requires minimizing a non-convex loss function. For these problems, traditional algorithms guarantee to find locally optimal solutions, but they can be globally sub-optimal. In this talk, we explore two algorithmic ideas that achieve guarantees beyond local optimality. First, we study the idea of injecting large random noise to stochastic gradient descent. For general non-convex functions, we show that the algorithm asymptotically converges to the global minimum, and it escapes all reasonably "shallow" local minima in polynomial time. For empirical risk minimization, we show that (under certain conditions) the algorithm efficiently finds a local minimum of the population risk, escaping all local minima that only exist in the empirical risk.

Second, we study the idea of convexifying convolutional neural networks (CNNs). By capturing the non-linearity and the parameter sharing of CNN in a convex manner, we are able to relax the learning of CNN to a convex optimization problem. For learning two-layer CNNs, we prove that the generalization error obtained by a convexified CNN converges to that of the best possible CNN. Then we extend the algorithm to learning deeper CNNs. Empirically, the convexified CNN outperforms classical CNNs trained by back-propagation, and achieves performance competitive with other non-convex methods.

Huacheng Yu

Harvard University

**Abstract:**
In this work, we introduce an online model for communication complexity. Analogous to how online algorithms receive their input piece-by-piece, our model presents one of the players Bob his input piece-by-piece, and has the players Alice and Bob cooperate to compute a result it presents Bob with the next piece. This model has a closer and more natural correspondence to dynamic data structures than the classic communication models do and hence presents a new perspective on data structures.

We apply the online communication model to data structure lower bounds by studying the Group Range Problem, a dynamic data structure problem. This problem admits an O(log n)-time worst-case data structure. Using online communication complexity, we prove a tight cell-probe lower bound: spending o(log n) (even amortized) time per operation results in at best an exp(-delta^2 n) probability of correctly answering a (1/2+delta)-fraction of the n queries.

In this talk, I will present a lower bound for the online set intersection problem in the online communication model, demonstrating a general approach for proving online communication lower bounds. The online communication model prevents a batching trick that classic communication complexity allows, and yields a higher lower bound.

Joint work with Josh Alman and Joshua R. Wang.

Christos Papadimitriou

Columbia University

**Abstract:**

Sepehr Assadi

University of Pennsylvania

**Abstract:**
We study the maximum k-set coverage problem in the following distributed setting. A collection of input sets S_1, ..., S_m over a universe [n] is partitioned across p machines and the goal is to find k sets whose union covers the most number of elements. The computation proceeds in rounds. In each round, all machines simultaneously send a message to a central coordinator who then communicates back to all machines a summary to guide the computation for the next round. At the end of the last round, the coordinator outputs the answer. The main measures of efficiency in this setting are the approximation ratio of the returned solution, the communication cost of each machine, and the number of rounds of computation.

Our main result is an asymptotically tight bound on the tradeoff between these three measures for the distributed maximum coverage problem: any r-round protocol for this problem either incurs a communication cost of m^{Omega(1/r)} or only achieves an approximation factor of k^{Omega(1/r)}; both these factors are also tight. We further use our results in this distributed setting to obtain new bounds for the maximum coverage problem in two other main models of computation for massive datasets, namely, the dynamic streaming model and the MapReduce model.

Based on joint work with Sanjeev Khanna (to appear in SODA'18)

Mikkel Thorup

University of Copenhagen

**Abstract:**
Joint work with Søren Dahlgaard and Mathias BækTejs Knudsen

We want a random function S mapping subsets A of [u] into vectors S(A) of size t, such that similarity is preserved. More precisely: Given two subsets A,B of [u] define X_i = [S(A)[i] = S(B)[i]] and X = X_1 + X_2 + .. + X_t. We want to have E(X) = t * J(A,B), where J(A,B) = |A\cap B|/|A\cup B| and furthermore to have strong concentration guarantees (i.e. Chernoff-style bounds) for X. This is a fundamental problem which has found numerous applications in data mining, large-scale classification, computer vision, similarity search, etc. via the classic MinHash algorithm. The vectors S(A) are also called sketches.

The seminal t*MinHash algorithm uses t random hash functions h_1,...,h_t, and stores min(h_1(A)),..., min(h_t(A)) as the sketch of A. The main drawback of MinHash is, however, its O(t * |A|) running time, and finding a sketch with similar properties and faster running time has been the subject of several papers. Addressing this, Li et al. [NIPS'12] introduced one permutation hashing (OPH), which creates a sketch of size t in O(t + |A|) time, but with the drawback that possibly some of the t entries are "empty" when |A| = O(t). One could argue that sketching is not necessary in this case, however the desire in most applications is to have one sketching procedure that works for sets of all sizes. Therefore, filling out these empty entries is the subject of several follow-up papers initiated by Shrivastava and Li [ICML'14]. However, these "densification" schemes fail to provide good concentration bounds exactly in the case |A| = O(t), where they are needed.

In this paper we present a new sketch which obtains the best of both worlds. That is, a fast O(t log t + |A|) running time while getting essentially the same strong concentration bounds as MinHash. Our new sketch can be seen as a mix between sampling with replacement and sampling without replacement. We demonstrate the power of our new sketch by considering popular applications in large-scale classification with linear SVM as introduced by Li et al. [NIPS'11] as well as approximate similarity search using the LSH framework of Indyk and Motwani [STOC'98]. In particular, for the j_1,j_2-approximate similarity search problem on a collection of n sets we obtain a data-structure with space usage O(n^{1+rho} log n + sum_{A \in C} |A|) and O(n^rho log n + |Q|) time for querying a set Q compared to a $(n^rho log n * |Q|) query time of the classic result of Indyk and Motwani.

Pravesh Kothari

Institute for Advanced Study

**Abstract:**
In this talk, I'll present efficient algorithms for estimating low-degree moment tensors of an unknown distribution in the presence of adversarial outliers. The guarantees of this algorithm improve significantly over the best previous results. These strong guarantees allow us to obtain improved outlier-robust algorithms for parameter recovery in well-studied latent variable statistical models such as Independent Component Analysis and Mixtures of Gaussians.

Our algorithms are all based on a standard sum-of-squares relaxation of the following conceptually-simple optimization problem:
Among all distributions whose moments are bounded in the same way as for the unknown distribution, find the one that has the smallest statistical distance to the empirical distribution of the adversarially-corrupted input sample.

Based on joint work with David Steurer.

Benjamin Miller

University of Wisconsin--Madison

**Abstract:**
Most work in mechanism design assumes that buyers are risk neutral; some considers risk aversion arising due to a non-linear utility for money. Yet behavioral studies have established that real agents exhibit risk attitudes which cannot be captured by any expected utility model. We initiate the study of revenue-optimal mechanisms under behavioral models beyond expected utility theory. We adopt a model from prospect theory which arose to explain these discrepancies and incorporates agents under-weighting uncertain outcomes. In our model, an event occurring with probability x < 1 is worth strictly less to the agent than x times the value of the event when it occurs with certainty.

In this talk, I will introduce prospect theory and discuss the impact of prospect-theoretic buyer behavior on revenue-optimal mechanism design. In contrast to the risk-neutral setting, the optimal mechanism may be randomized and appears challenging to find, even for a single buyer and a single item for sale. Nevertheless, we give a characterization of the optimal mechanism which enables positive approximation results. In particular, we show that under a reasonable bounded-risk-aversion assumption, posted pricing obtains a constant approximation. Notably, this result is "risk-robust" in that it does not depend on the details of the buyerâ€™s risk attitude. Finally, I will discuss a dynamic setting, in which the buyer is uncertain about his future value. In contrast to positive results for a risk-neutral buyer, we show that the buyer's risk aversion may prevent the seller from approximating the optimal revenue in a risk-robust manner.

Lin F. Yang

Princeton University

**Abstract:**
We present data streaming algorithms for the k-median problem in high-dimensional dynamic geometric data streams, i.e. streams allowing both insertions and deletions of points from a discrete Euclidean space {1,2,...,Δ}^d. Our algorithms use k/eps^2 * poly(d log Δ) space/time and maintain with high probability a small weighted set of points (a coreset) such that for every set of k centers the k-median cost of the coreset (1+epsilon)-approximates the cost of the streamed point set. We can use this coreset to compute a (1+epsilon)-approximation for the k-median problem by any efficient offline k-median algorithm. All previous algorithms for computing a (1+epsilon)-approximation for the k-median problem over dynamic data streams required space and time exponential in d. Our algorithms can be generalized to metric spaces of bounded doubling dimension.

Joint work with: Vladimir Braverman, Gereon Frahling, Harry Lang and Christian Sohler

Sasho Nikolov

University of Toronto

**Abstract:**
In the vector balancing problem, we are given N vectors v_1,..., v_N in an n-dimensional normed space, and our goal is to assign signs to them, so that the norm of their signed sum is as small as possible. The balancing constant of the vectors is the smallest number beta, such that any subset of the vectors can be balanced so that their signed sum has norm at most beta.

The vector balancing constant generalizes combinatorial discrepancy, and is related to rounding problems in combinatorial optimization, and to the approximate Caratheodory theorem. We study the question of efficiently approximating the vector balancing constant of any set of vectors, with respect to an arbitrary norm. We show that the vector balancing constant can be approximated in polynomial time to within factors logarithmic in the dimension, and is characterized by (an appropriately optimized version of) a known volumetric lower bound. Our techniques draw on results from geometric functional analysis and the theory of Gaussian processes. Our results also imply an improved approximation algorithm for hereditary discrepancy.

Joint work with Daniel Dadush, Nicole Tomczak-Jaegermann, and Kunal Talwar.

Contact Alexandr Andoni, Omri Weinstein, or Timothy Sun if you want to

There is a mailing list for the reading group. General information about the mailing list (including how to subscribe yourself to it) is available here.