Columbia Theory Seminar, Spring 2019

For Spring 2019 the usual time for the meetings will be Fridays at 13:00 in the CS Conference Room (453 CSB). Abstracts for talks are given below the talk schedule.

Friday, January 25, 13:30, DSI Conference Room: David Wajc: The Greedy Algorithm is Not Optimal for On-line Edge Coloring (Abstract)
Friday, February 1, 13:30, CS Conference Room: Mika Göös: Adventures in Monotone Complexity and TFNP (Abstract)
Friday, February 8, 13:00, CS Conference Room: Michael Bender: Bloom Filters, Adaptivity, and the Dictionary Problem (Abstract)
Friday, February 15, 13:00, CS Conference Room: Shubhangi Saraf: Factors of sparse polynomials: structural results and some algorithms (Abstract)
Thursday, February 21, 12:30, CS Conference Room: Sasha Golovnev: Static Data Structure Lower Bounds Imply Rigidity (Abstract)
Monday, March 4, 14:00, CS Conference Room: Zhao Song: Solving Linear Programs in the Current Matrix Multiplication Time (Abstract)
Friday, March 8, 13:00, CS Conference Room: Shuchi Chawla: The curse of dimensionality in revenue optimal mechanism design: diabolical or non-existent? (Abstract)
Friday, March 15, 13:00, CS Conference Room: Dean Doron: Near-Optimal Pseudorandom Generators for Constant-Depth Read-Once Formulas (Abstract)
Tuesday, March 26: Vaggos Chatziafratis: An Optimization Viewpoint for Hierarchical Clustering (Abstract)
Thursday, April 4, 13:00, CS Conference Room: Bo Waggoner: Multi-Observation Losses (Abstract)
Friday, April 12, 13:00, CS Conference Room: Martin Farach-Colton: Algorithmic Games and External Memory Data Structures (Abstract)
Friday, April 19, 13:00, CS Conference Room: Mert Saglam: Near log-convexity of measured heat in (discrete) time and consequences (Abstract)
Thursday, April 25, 13:00, CS Conference Room: Tim Kunisky: Hardness of certification for random optimization problems (Abstract)
Friday, May 3, 13:00, CS Conference Room: Aaron Roth: Statistical Individual Fairness in Machine Learning (Abstract)
Monday, May 6, 13:00, CS Conference Room: Shuchi Chawla: The curse of dimensionality in revenue optimal mechanism design: diabolical or non-existent? (Abstract)
Wednesday, May 8, 13:00, CS Conference Room: Greg Valiant: New Problems and Perspectives on Learning, Testing, and Sampling in the Small Data Regime (Abstract)
Thursday, May 16, 13:00, CS Conference Room: Pieter Kleer: The Switch Algorithm for the Uniform Sampling of Graphs with a Given Degree Sequence (Abstract)
Tuesday, May 21, 13:00, CS Conference Room: Arnold Filtser: Steiner Point Removal with distortion O(log k), using the Relaxed Voronoi algorithm (Abstract)
Thursday, May 23: Eli Jaffe: On Explicit Depth-Robust Graphs (Abstract)

Talk Abstracts

Friday, January 25:
The Greedy Algorithm is Not Optimal for On-line Edge Coloring
David Wajc
Carnegie Mellon University

Abstract: The classic edge coloring problem asks to partition the edges of a graph into as few matchings (colors) as possible. By Vizing's theorem, any graph of maximum degree Δ can be edge colored using Δ+1 colors (in polynomial time). However, in many applications of edge coloring the graph is not entirely known in advance, making the problem harder. Motivated by such applications, we study edge coloring in the online adversarial vertex arrival model of Karp, Vazirani and Vazirani. Under such adversarial arrivals, a 1992 paper of Bar-Noy, Naor and Motwani, titled simply "The greedy algorithm is optimal for on-line edge coloring", proves the optimality of the trivial greedy algorithm, which on graphs with maximum degree Δ requires 2Δ-1 colors. However, their online lower bound requires Δ=O(log n), and they conjectured the existence of better algorithms than greedy for Δ large enough. We resolve this conjecture positively for bipartite graphs with Δ=ω(logn), and present optimal algorithms (up to o(1) terms) for such graphs. Moreover, we show a surprising dichotomy between the setting where Δ is known apriori and when this crucial parameter is unknown, and present optimal bounds for both known and unknown Δ scenarios.

Joint work with Ilan Reuven Cohen and Binghui Peng.

Friday, February 1:
Adventures in Monotone Complexity and TFNP
Mika Göös
Institute for Advanced Study

Abstract: *Separations:* We introduce a monotone variant of XOR-SAT and show it has exponential monotone circuit complexity. Since XOR-SAT is in NC^2, this improves qualitatively on the monotone vs. non-monotone separation of Tardos (1988). We also show that monotone span programs over R can be exponentially more powerful than over finite fields. These results can be interpreted as separating subclasses of TFNP in communication complexity.

*Characterizations:* We show that the communication (resp. query) analogue of PPA (subclass of TFNP) captures span programs over F_2 (resp. Nullstellensatz degree over F_2). Previously, it was known that communication FP captures formulas (Karchmer--Wigderson, 1988) and that communication PLS captures circuits (Razborov, 1995).

Joint work with Pritish Kamath, Robert Robere, and Dmitry Sokolov.

Friday, February 8:
Bloom Filters, Adaptivity, and the Dictionary Problem
Michael A. Bender
Stony Brook University

Abstract: A Bloom filter (or alternative) maintains a compact, probabilistic representation of a set S of keys from a universe U. The price of being small is that there is a (bounded) probability of false positives.

This talk presets alternatives to Bloom filters that are faster, more space efficient, and support a wider range of operations. We show how these filters can adapt based on the results of past queries.

Joint work with Mayank Goswami, Sam McCauley, Martin Farach-Colton, Rob Johnson, and Shikha Singh.

Friday, February 15:
Factors of sparse polynomials: structural results and some algorithms
Shubhangi Saraf
Rutgers University

Abstract: Are factors of sparse polynomials sparse? This is a really basic question and we are still quite far from understanding it in general.

In this talk, I will discuss a recent result showing that this is in some sense true when the polynomial has each variable appearing only with bounded degree. In particular we show if f is an s-sparse polynomial in n variables, with individual degrees of its variables bounded by d, then the sparsity of each factor of f is bounded by s^{O(d^2 logn)}. This is the first nontrivial bound on factor sparsity for any value of d which is more than 2. Our sparsity bound uses techniques from convex geometry, such as the theory of Newton polytopes and an approximate version of the classical Caratheodory's Theorem.

Using our sparsity bound, we then show how to devise efficient (quasi polynomial time) deterministic factoring algorithms for sparse polynomials of bounded individual degree. Prior to our work, the only efficient factoring algorithms known for this class of polynomials were randomized, and other than for the cases of d=1 and d=2, only exponential time deterministic factoring algorithms were known.

The talk is based on joint work with Vishwas Bhargav and Ilya Volkovich.

Thursday, February 21:
Static Data Structure Lower Bounds Imply Rigidity
Sasha Golovnev
Harvard University

Abstract: We show that static data structure lower bounds in the group (linear) model imply semiexplicit lower bounds on matrix rigidity. In particular, we prove that an explicit lower bound of t > omega(log^2 n) on the cell-probe complexity of linear data structures in the group model, even against arbitrarily small linear space (s = (1 + eps)n), would already imply a semi-explicit (P^NP) construction of rigid matrices with significantly better parameters than the current state of art (Alon, Panigrahy and Yekhanin, 2009). Our results further assert that polynomial (t >= n^delta) data structure lower bounds against near-optimal space, would imply super-linear circuit lower bounds for log-depth linear circuits (a four-decade open question). In the succinct space regime (s = n + o(n)), we show that any improvement on current cell-probe lower bounds in the linear model would also imply new rigidity bounds. Our results rely on a new connection between the "inner" and "outer" dimensions of a matrix (Paturi and Pudlak, 2006), and on a new reduction from worst-case to average-case rigidity, which is of independent interest.

Joint work with Zeev Dvir and Omri Weinstein.

Monday, March 4:
Solving Linear Programs in the Current Matrix Multiplication Time
Zhao Song
UT Austin

Abstract: We show how to solve linear programs with accuracy epsilon in time n^{omega+o(1)} log(1/epsilon) where omega~2.3728639 is the current matrix multiplication constant. This hits a natural barrier of solving linear programs since linear systems is a special case of linear programs and solving linear systems require time n^{omega} currently.

Joint work with Michael B. Cohen and Yin Tat Lee.

Friday, March 8:
The curse of dimensionality in revenue optimal mechanism design: diabolical or non-existent?
Shuchi Chawla
University of Wisconsin - Madison

Abstract: Multi-item revenue optimal mechanisms can be very complex, offering large menus of different options to the buyer, some of which may be randomized. A primary line of inquiry within algorithmic mechanism design is to establish approximate optimality via "simple" mechanisms. The conventional belief for many years was that revenue optimal mechanisms display the worst kind of curse of dimensionality: in one dimension, optimal mechanisms are always simple, whereas in two or more dimensions, the revenue gap between optimal and simple mechanisms is unbounded.

We challenge this conventional belief by showing that these large gaps can only happen in unrealistic situations, where the mechanism overcharges a buyer for a bundle while selling its constituents at much lower prices. Arguably this does not correctly account for the buyer's incentives as the buyer can break his order into smaller pieces paying a much lower price overall. Our main result is that if the buyer is allowed to purchase as many (randomized) bundles as he pleases, the revenue of any multi-item mechanism is at most O(log n) times the revenue achievable by item pricing, where n is the number of items. This holds in the most general setting possible, with an arbitrarily correlated distribution of buyer types and arbitrary valuations.

We also show that this result is tight in a very strong sense. Any family of mechanisms of sub-exponential description complexity cannot achieve better than logarithmic approximation even against the best deterministic mechanism and even for additive valuations. In contrast, item pricing that has linear description complexity matches this bound against randomized mechanisms.

Joint work with Yifeng Teng and Christos Tzamos.

Friday, March 15:
Near-Optimal Pseudorandom Generators for Constant-Depth Read-Once Formulas
Dean Doron
UT Austin

Abstract: We give an explicit nearly-optimal pseudorandom generator (PRG) for read-once AC0 formulas. Previously, PRGs with near-optimal seed length were known only for the depth-2 case by Gopalan et al.. For a constant depth d > 2, the best prior PRG is a recent construction by Forbes and Kelley with seed length O(log^2 n) for the more general model of constant-width read-once branching programs with arbitrary variable order. Looking beyond read-once AC0, we show that our PRG also fools formulas with a bounded number of parity gates.

Our construction follows Ajtai and Wigderson' s approach of iterated pseudorandom restrictions, and the main technical crux of our work is constructing a suitable pseudorandom restriction generator for read-once AC0 having a short seed.

Joint work with Pooya Hatami and William Hoza.

Tuesday, March 26:
An Optimization Viewpoint for Hierarchical Clustering
Vaggos Chatziafratis
Stanford

Abstract: Hierarchical Clustering (HC) is a widely studied problem in unsupervised learning and exploratory data analysis, usually tackled by simple agglomerative procedures like average-linkage, single-linkage or complete-linkage. Applications of HC include reasoning about text documents, understanding the evolution of species and the tree of life, decomposing social networks like Facebook, or even organizing large data centers efficiently. Surprisingly, despite the plethora of heuristics for tackling the problem, until recently there was no optimization objective associated with it; this is in stark contrast with flat clustering objectives like k-means, k-median and k-center. In this talk, we give an overview of the recent optimization objectives for Hierarchical Clustering, explore some approximation algorithms and mention some of their drawbacks.

Most of the talk is based on joint works with Moses Charikar and Rad Niazadeh. (Mainly based on: https://arxiv.org/pdf/1808.02227.pdf).

Thursday, April 4:
Multi-Observation Losses
Bo Waggoner
Microsoft Research

Abstract: In machine learning, we evaluate an algorithm's prediction on an observed data point using a loss function. This talk will discuss a pair of papers that sprouted from a simple question: What happens, from a theoretical "information elicitation" perspective, when the loss can be a function of multiple observations? These losses turn out to elicit higher-order properties like the variance and 2-norm of distributions, and they lead to new algorithms for estimating these quantities. Along the way, we'll see Voronoi diagrams on the simplex, the Real Nullstellensatz, learning from corrupted samples, and counting species of butterflies. (No familiarity with any of the above will be assumed!)

Joint work with Rafael Frongillo, Tom Morgan, Sebastian Casalaina-Martin, and Nishant Mehta.

Friday, April 12:
Algorithmic Games and External Memory Data Structures
Martin Farach-Colton
Rutgers

Abstract: In this talk, I'll consider new variants of Balls and Bins, as well as old variants of Cup Emptying, in which one play adds water to a set of cups and one player empties water from the cups. I'll show what these games can tell us about external-memory data structures and how they related to getting customers at a startup.

Friday, April 19:
Near log-convexity of measured heat in (discrete) time and consequences
Mert Saglam
University of Washington

Abstract: We answer a 1982 conjecture of Erdos and Simonovits about the growth of number of k-walks in a graph, which incidentally was posed as a question earlier by Blakley and Dixon in 1966. We prove this conjecture in a more general setup than the earlier treatment, furthermore, through a refinement and strengthening of this inequality, we resolve two related open questions in complexity theory: the communication complexity of the k-Hamming distance is Omega(k log k) and that consequently any property tester for k-linearity requires Omega(k log k) queries.

Thursday, April 25:
Hardness of certification for random optimization problems
Tim Kunisky
New York University

Abstract: Certifying bounds on random optimization problems is an important computational task arising in many settings, from bounding the number of constraints of a random SAT instance that can be satisfied to controlling the size of an extremal structure in a random graph. It is usually quite difficult, however, to prove for families of certification algorithms like linear or semidefinite programs that they fail to optimize a random objective past a certain value. I will describe a recent work in which we produce a reduction from the task of certifying bounds on random quadratic programs to the task of hypothesis testing in a statistical model. We then apply a recently-developed heuristic connecting low-degree polynomial approximations of the likelihood ratio to the hardness of hypothesis testing, which gives strong evidence for the hardness of certification for a large class of random quadratic programs. Interpreted geometrically, our work suggests a gap between the mean width of certain convex sets of matrices, such as the cut polytope, and the mean width of any computationally tractable convex relaxation thereof.

Based on joint work with Afonso Bandeira and Alex Wein.

Friday, May 3:
Statistical Individual Fairness in Machine Learning
Aaron Roth
UPenn

Abstract: The literature on fairness in machine learning has yet to settle on definitions. At a high level, there are two families of fairness definitions. "Individual" definitions of fairness ask for semantically meaningful constraints that bind at the level of individuals. When these constraints are defined with respect to the unknown labels, they are often impossible to satisfy. "Statistical" definitions of fairness ask for equality of some error metric (like false positive rate) evaluated over "protected" populations. These are easy to check and satisfy, but don't provide guarantees to individuals.
We show that some of the benefits of both definitions can be obtained if one posits a distribution on problems, in addition to a distribution on datapoints. For example, one may ask that the error rates of false positive rates be equalized between every pair of individuals, where now the "rate" is an average over problems, and not an average over people. This now corresponds to a meaningful promise to individuals. We give oracle-efficient algorithms for optimizing classification error subject to this constraint, and prove generalization bounds over -both- distributions (so that we can make promises both to new individuals and over new problems).

Joint work with Michael Kearns and Saeed Sharifimalvajerdi.

Monday, May 6:
The curse of dimensionality in revenue optimal mechanism design: diabolical or non-existent?
Shuchi Chawla
U Madison

Abstract: Multi-item revenue optimal mechanisms can be very complex, offering large menus of different options to the buyer, some of which may be randomized. A primary line of inquiry within algorithmic mechanism design is to establish approximate optimality via "simple" mechanisms. The conventional belief for many years was that revenue optimal mechanisms display the worst kind of curse of dimensionality: in one dimension, optimal mechanisms are always simple, whereas in two or more dimensions, the revenue gap between optimal and simple mechanisms is unbounded.
We challenge this conventional belief by showing that these large gaps can only happen in unrealistic situations, where the mechanism overcharges a buyer for a bundle while selling its constituents at much lower prices. Arguably this does not correctly account for the buyer's incentives as the buyer can break his order into smaller pieces paying a much lower price overall. Our main result is that if the buyer is allowed to purchase as many (randomized) bundles as he pleases, the revenue of any multi-item mechanism is at most O(log n) times the revenue achievable by item pricing, where n is the number of items. This holds in the most general setting possible, with an arbitrarily correlated distribution of buyer types and arbitrary valuations.
We also show that this result is tight in a very strong sense. Any family of mechanisms of sub-exponential description complexity cannot achieve better than logarithmic approximation even against the best deterministic mechanism and even for additive valuations. In contrast, item pricing that has linear description complexity matches this bound against randomized mechanisms.

Joint work with Yifeng Teng and Christos Tzamos.

Wednesday, May 8:
New Problems and Perspectives on Learning, Testing, and Sampling in the Small Data Regime
Greg Valiant
Stanford

Abstract: I will discuss several new problems related to the general challenge of understanding what conclusions can be made, given a dataset that is relatively small in comparison to the complexity or dimensionality of the underlying distribution from which it is drawn. In the first setting we consider the problem of learning a population of Bernoulli (or multinomial) parameters. This is motivated by the ``federated learning" setting where we have data from a large number of heterogeneous individuals, who each supply a very modest amount of data, and ask the extent to which the number of data sources can compensate for the lack of data from each source. Second, we will discuss the problem of estimating the ``learnability'' of a dataset: given too little labeled data to train an accurate model, we show that it is often possible to estimate the extent to which a good model exists. Specifically, given labeled data pairs (x, y) drawn from some unknown distribution over such pairs, it is possible to estimate how much of the variance of y can be explained via the best linear function of x, even in the regime where it is impossible to approximate that linear function. Finally, I will introduce the problem of data "amplification". Given n independent draws from a distribution, D, to what extent is it possible to output a set of m > n datapoints that are indistinguishable from m i.i.d. draws from D? Curiously, we show that nontrivial amplification is often possible in the regime where n is too small to learn D to any nontrivial accuracy. We also discuss connections between this setting and the challenge of interpreting the behavior of GANs and other ML/AI systems. This talk will also highlight a number of concrete and more conceptual open directions in all three veins.

This work is based on several papers, with Weihao Kong, and with Brian Axelrod, Shivam Garg, and Vatsal Sharan.

Thursday, May 16:
The Switch Algorithm for the Uniform Sampling of Graphs with a Given Degree Sequence
Pieter Kleer
CWI Amsterdam

Abstract: The uniform generation (or sampling) of a simple graph with a given degree sequence is an open problem in computer science that has been studied for several decades. One of the simplest approaches to this problem is the switch algorithm: start with an initial graph with the desired degree sequence and repeatedly switch two edges uniformly at random, while preserving the degree sequence. How many switches are needed before the resulting graph is close to a uniform sample from the set of all graphs with the given degree sequence (independent of the initial graph)? Kannan, Tetali and Vempala initiated the study of this question some twenty years ago, but, despite having received considerable attention, an answer to question is still not known.

In this talk we will present a novel proof approach for the analysis of the switch Markov chain (naturally induced by the switch algorithm). In particular, we show that the switch Markov chain is rapidly mixing for a large range of degree sequences that unifies (almost) all results in the literature (the chain is called rapidly mixing if only a polynomial number of switches is needed to get close to a uniform sample). We also discuss an extension of the problem, where, in addition to the degree sequence, also degree correlations are specified. We make the first progress on the question of sampling graphs with such additional constraints. Our analysis relies on ideas of Bhatnagar, Randall, Vazirani and Vigoda introduced in the context of sampling exact (perfect) matchings, together with our new approach for the analysis of the switch Markov chain.

Joint work with Yorgos Amanatidis (CWI Amsterdam)

Tuesday, May 21:
Steiner Point Removal with distortion O(log k), using the Relaxed Voronoi algorithm
Arnold Filtser
Ben-Gurion University Of The Negev

Abstract: In the Steiner Point Removal (SPR) problem, we are given a weighted graph G=(V,E) and a set of terminals K\subset V of size k. The objective is to find a minor M of G with only the terminals as its vertex set, such that distances between the terminals will be preserved up to a small multiplicative distortion.

Kamma, Krauthgamer and Nguyen [SICOMP2015] devised a ball-growing algorithm with exponential distributions to show that the distortion is at most O(log^5 k).

Cheung [SODA2018] improved the analysis of the same algorithm, bounding the distortion by O(log^2 k).

We devise a novel and simpler algorithm (called the Relaxed Voronoi algorithm) which incurs distortion O(log k). This algorithm can be implemented in almost linear time (O(|E| log |V|)).

Thursday, May 23:
On Explicit Depth-Robust Graphs
Eli Jaffe
UCLA

Abstract: We study the problem defined by Erdos and Szemeredi in 1975 of constructing sparse depth- robust graphs. Recall that a directed acyclic graph G is (e,d)-depth-robust if it is guaranteed to contain a path of length d even after the deletion of any e nodes and all of their incident edges. The original construction (of nearly optimal depth-robust graphs) of Erdos and Szemeredi required logarithmic in-degree and subsequent work by Mahmoody, Moran, and Vadhan made that construction explicit. One of the major open questions left since that 1975 seminal work was to construct depth- robust graphs of constant degree. Our contribution is the first explicit construction of constant-degree depth-robust graphs. Our construction too enjoys nearly optimal depth-robustness. We accomplish this via a novel expanding graph product operator X that takes three input graphs (G,H,X) with special properties and outputs a new graph. Informally, we show that our operator provides the following guarantee: if G and H are depth-robust graphs and H is a constant-degree expanding graph [RVW00], than G = X(G, H, X) is a depth-robust graph of size |G| * |H| whose degree depends only additively on degrees of H and X. We then show that the recursive application of the expanding graph product operator yields a simple and explicit iterative construction for constant-degree depth-robust graphs of arbitrary size. In particular, we show that a graph of size n will enjoy (Omega(n1-eps),Omega(n1-eps))- depth-robustness for any eps > 0 and give an algorithm for computing labels of all nodes that have a direct edge to/from a given node labeled i in time poly(log i). Ours is the first explicitly constructed constant-degree depth robust graph with guaranteed lower bounds on its depth-robustness (in contrast to only probabilistic guarantees). Previous explicit constructions were of logarithmic degree or worked only with probability < 1. Beyond theoretical relevance, our construction has practical implications including a new data-independent memory-hard function (iMHF), a desirable cryptographic primitive for crypto-currencies, with guaranteed lower bounds on its memory complexity.

Contact Alexandr Andoni, Omri Weinstein, or Timothy Sun if you want to volunteer to give a talk (especially students!). The talk can be about your or others' work. It can be anything from a polished presentation of a completed result, to an informal black-board presentation of an interesting topic. It should be accessible to a general theory audience. Alex will be happy to help students choose papers to talk about.

There is a mailing list for the reading group. General information about the mailing list (including how to subscribe yourself to it) is available here.

Back to Theory of Computation at Columbia.