Columbia Theory Seminar, Spring 2011

For Spring 2011, the usual time for the meetings will be Monday at 13:00 - 14:30 in the CS conference room, CSB 453. Abstracts for talks are given below the talk schedule.

Friday, February 25, 10:00am, CSB 453 (note unusual date and time): Moshe Lewenstein: Fast, precise and dynamic distance queries (Abstract)

Monday, February 28, 1:00pm, CSB 453: Dov Gordon: Secure Computation: From Theory Towards Practice (Abstract)

Monday, March 7, 1:00pm, CSB 453: Sergei Vassilvitskii: Cross-Validation and Mean-Square Stability (Abstract)

Thursday, March 24, 11:00am, CSB 477 (note unusual date, time, and place): Nova Fandina: Optical Solution for restricted Hard on Average NEXP Time problems (Abstract)

Monday, April 4, 1:00pm, CSB 453: David Xiao: Privacy, incentives, and truthfulness (Abstract)

Monday, April 11, 1:00pm, CSB 453: Shubhangi Saraf: High rate error correcting codes with sublinear time decoding (Abstract)

Monday, April 18, 1:00pm, CSB 453: Madhur Tulsiani: Decomposition Theorems in Combinatorics and Complexity (Abstract)

Monday, April 25, 12:30pm, CSB 477 (note unusual time and place): Yevgeniy Vahlis: Verifiable Delegation of Computation over Large Datasets (Abstract)

Monday, May 2, 1:00pm, CSB 453: Xi Chen: Non-negatively Weighted #CSP: An Effective Complexity Dichotomy (Abstract)

Monday, June 6, 1:00pm, CSB 453: Mike Rosulek: The (nearly) all-or-nothing nature of universally composable security (Abstract)

Thursday, June 16, 2:00pm, CSB 453 (note unusual date and time): Kevin Matulef: Property Testing Lower Bounds via Communication Complexity (Abstract)

Wednesday, July 13, 2:30pm, CSB 477 (note unusual date, time, and place): Andrew Wan: Pseudorandomness for read-once formulas (Abstract)

Thursday, July 21, 3:00pm, CSB 477 (note unusual date, time, and place): Ilias Diakonikolas: On the Complexity of Efficiency-Revenue Trade-offs in Bayesian Auctions (Abstract)

Wednesday, August 10, 2:00pm, CSB 453 (note unusual date and time): Anindya De: Extractors and lower bounds for locally samplable sources (Abstract)

Talk Abstracts

Friday, February 25:
Fast, precise and dynamic distance queries
Moshe Lewenstein
Bar-Ilan University

Abstract: We present an approximate distance oracle for a point set S with n points and doubling dimension [;\lambda;]. For every [;\epsilon > 0;], the oracle supports [;(1 + \epsilon);] - approximate distance queries in (universal) constant time, occupies space [;[\epsilon^{-O(\lambda)} + 2^{O(\lambda \log \lambda)}]n;], and can be constructed in [;[2^{O(\lambda)}\log^3 n + \epsilon^{-O(\lambda)} + 2^{O(\lambda \log \lambda)}]n;] expected time. This improves upon the best previously known constructions, presented by Har-Peled and Mendel. Furthermore, the oracle can be made fully dynamic with expected [;O(1);] query time and only [;2^{O(\lambda)} \log n + \epsilon^{-O(\lambda)} + 2^{O(\lambda \log \lambda)};] update time. This is the first fully dynamic [;(1+\epsilon);]-distance oracle.

Coauthors: Yair Bartal, Lee-Ad Gottlieb, Tsvi Kopelowitz, Liam Roditty

Monday, February 28:
Secure Computation: From Theory Towards Practice
Dov Gordon
Columbia University

Abstract: In 1982, Yao introduced the field of "secure computation", in which [;n;] parties, holding private inputs [;x_1,...,x_n;], engage in a protocol to compute some function [;f(x_1,...,x_n);], while revealing nothing more than the output. In the decade that followed, the topic of secure computation was thoroughly explored, and almost every theoretical question was answered. In the past decade, the focus has shifted towards improving efficiency, and building implementations. Today, secure computation is poised to become an extremely powerful tool with wide-ranging application. Both bodies of research were essential in bringing this about.

In this talk, we review two recent results. The first provides insight into one of the few remaining theoretical questions in secure computation. The second demonstrates improved efficiency for secure computation in a particular setting of interest. On the theoretical side, we discuss the problem of "fairness" in secure computation, which is a security property guaranteeing simultaneous output delivery to both parties. Until recently, this was widely believed to be impossible to achieve. We will discuss a new relaxed notion of fairness and show broad feasibility results. We will also touch on a very exciting result demonstrating (for the first time) that some interesting functions can in fact be computed with complete fairness. In the second half of the talk, we will focus on a particular setting where a server owns a large database, and repeatedly interacts with a client in a sequence of secure computations over that data. We demonstrate great improvement over the existing general solutions by amortizing the cost of the computation. In presenting these two results, we hope to demonstrate the importance of both foundational and applied cryptographic theory.

Monday, March 7:
Cross-Validation and Mean-Square Stability
Sergei Vassilvitskii
Yahoo! Research

Abstract: A popular practical method of obtaining a good estimate of the error rate of a learning algorithm is [;k;]-fold cross-validation. Here, the set of examples is first partitioned into [;k;] equal-sized folds. Each fold acts as a test set for evaluating the hypothesis learned on the other [;k-1;] folds. The average error across the k hypotheses is used as an estimate of the error rate. Although widely used, especially with small values of [;k;] (such as 10), the cross-validation method has heretofore resisted theoretical analysis due to the fact that the [;k;] distinct estimates have inherent correlations between them. With only sanity-check bounds known, there is no compelling reason to use the [;k;]-fold cross-validation estimate over a simpler holdout estimate.

Conventional wisdom is that the averaging in cross-validation leads to a tighter concentration of the estimate of the error around its mean. We show that the conventional wisdom is essentially correct. We analyze the reduction in variance of the gap between the cross-validation estimate and the true error rate, and show that for a large family of stable algorithms cross-validation achieves a near optimal variance reduction factor of [;\frac{1 +o(1)}{k};]. In these cases, the [;k;] different estimates are essentially independent of each other.

To proceed with the analysis, we define a new measure of algorithm stability, called mean-square stability. This measure is weaker than most stability notions described in the literature, and encompasses a large class of algorithms including bounded SVM regression and regularized least-squares regression. For slightly less stable algorithms such as t-nearest-neighbor, we show that cross-validation leads to an [;O\left(\frac{1}{\sqrt{k}}\right);] reduction in the variance of the generalization error.

Thursday, March 24:
Optical Solution for restricted Hard on Average NEXP Time problems
Nova Fandina
Ben-Gurion University

Abstract: Galperin and Wigderson proposed a succinct representation for graphs, that uses number of bits that is logarithmic in the number of nodes. They proved complexity results for various decision problems on graph properties, when the graph is given in a succinct representation. Later, Papadimitriou and Yannakakis showed, that under the same succinct encoding method, certain class of decision problems on graph properties becomes exponentially hard. We consider the complexity of the Permanent problem when the graph/matrix is given in a restricted succinct representation. We present an optical architecture that is based on the holographic concept for solving balanced succinct permanent problem.

Joint work with S. Dolev, J. Rosen.

Monday, April 4:
Privacy, incentives, and truthfulness
David Xiao
Université Paris-Sud

Abstract: Privacy has become an ever more pressing concern as we conduct more and more of our lives in public forums such as the Internet. One privacy question that has received much study is how a database curator may output "sanitized" data that does not reveal too much information about any particular individual. This criteria has been formalized as differential privacy, proposed originally by Dwork et al. (TCC '06 and ICALP '06), which captures the idea that "the presence or absence of any individual's data does not change the distribution of the sanitized data by much". This guarantee has been interpreted to mean that individuals should be comfortable revealing their information, since their participation barely changes the output.

In this talk, we argue that the interpretation is incomplete because unless participation in the database somehow explicitly benefits the individuals, individuals will always refuse to participate truthfully, regardless of whether the sanitization mechanism is differentially private or not.

We therefore advocate studying differential privacy in conjunction with the notion of truthfulness from game theory, which says that a mechanism should be designed so that it is in the individuals' own interest to give their true information. We show that there exist games for which differentially private mechanisms, in particular the exponential mechanism of McSherry and Talwar (FOCS '07), do not motivate the individuals to participate truthfully. On the positive side, we show that a wide class of games do admit differentially private, truthful, and efficient mechanisms.

Finally, we explore the possibility of tradeoffs between utility and privacy. This is because individuals may be willing to give up some privacy if they receive enough utility from a game, and vice versa. We show that, under a natural measure of information cost, even the release of a differentially private histogram may reveal so much information that individuals would rather suffer the consequences of lying rather than have their information published.

Monday, April 11:
High rate error correcting codes with sublinear time decoding
Shubhangi Saraf
MIT

Abstract: Locally decodable codes are error-correcting codes that admit efficient decoding algorithms: They give a method to encode k bit messages into n bit codewords such that even after a constant fraction of the bits of the codeword get corrupted any bit of the original message can be recovered by only looking at q(k) bits of the corrupted codeword. The tradeoff between the rate of a code (i.e., the ratio k/n) and the locality/efficiency (the function q(k)) of its decoding algorithms has been studied extensively in the past decade. However most prior work has focused on codes with extremely small q (e.g., constant functions), and the resulting constructions suffer from extremely poor rate (k/n goes to zero; in fact n grows nearly exponentially in k). Indeed it was widely believed that such behavior is inherent: Known codes with locality [;k^{\frac{1}{c}};] had rate [;2^{- O(c)};]; and no code with non-trivial locality (q = o(k)) and rate > 1/2 (the practical regime) was known.

In this talk we overcome the rate 1/2 barrier and give a new class of codes with very high rates (arbitrarily close to 1) with strong local decoding properties ([;q(k) = k^{\epsilon};] for arbitrarily small epsilon), thereby giving new performance tradeoffs between the rate and locality of decoding. These codes, which we call multiplicity codes, are based on evaluating high degree multivariate polynomials and their derivatives. Multiplicity codes extend traditional multivariate polynomial based codes; they inherit the local-decodability of these codes, and at the same time achieve better tradeoffs and flexibility in the rate and decodability. These codes give respectable parameters even in concrete settings (and not just in asymptotic behavior), giving hope that local decoding techniques may have practical implications.

Based on joint work with Swastik Kopparty (IAS) and Sergey Yekhanin (MSR, Silicon Valley)

Monday, April 18:
Decomposition Theorems in Combinatorics and Complexity
Madhur Tulsiani
Princeton University and IAS

Abstract: The term "decomposition theorems" is often used to describe results in combinatorics which analyze how an object of study behaves with respect to a family of tests. Such results decompose an object into a part which has a simple structure and a part whose structure is completely invisible to the tests in question. I will discuss the following results proved in somewhat different contexts, and show how they can be cast in this framework:

(a) the Weak Regularity Lemma of Frieze and Kannan, a result in graph theory,

(b) the "Dense Model Theorem" of Green, Tao and Ziegler, a result in additive combinatorics, that helps transfer (for example) results about dense sets of integers to results about the primes, and

(c) the Impagliazzo Hard-Core Set lemma, a result from computational complexity theory, which is a structure theorem for functions that are hard to compute.

I will show that these results are very strongly related in that they not only imply each other; but the proof techniques used for proving each of these can be used to prove a general result from the all three follow quite easily. I will also discuss some open questions that are raised by the above connections and mention some recent results.

Based on joint works with Omer Reingold, Luca Trevisan and Salil Vadhan.

Monday, April 25:
Verifiable Delegation of Computation over Large Datasets
Yevgeniy Vahlis
Columbia University

Abstract: We study the problem of computing on large datasets that are stored on an untrusted server. We follow the approach of amortized verifiable computation introduced by Gennaro, Gentry, and Parno in CRYPTO 2010. We present the first practical verifiable computation scheme for high degree polynomial functions. Such functions can be used, for example, to make predictions based on polynomials fitted to a large number of sample points in an experiment. In addition to the many non-cryptographic applications of delegating high degree polynomials, we use our verifiable computation scheme to obtain new solutions for verifiable keyword search, and proofs of retrievability. Our constructions are based on the DDH assumption and its variants, and achieve adaptive security, which was left as an open problem by Gennaro \etal (albeit for general functionalities).

Joint work with Siavosh Benabbas and Rosario Gennaro.

Monday, May 2:
Non-negatively Weighted #CSP: An Effective Complexity Dichotomy
Xi Chen
Columbia University

Abstract: We prove a complexity dichotomy theorem for all non-negatively weighted counting Constraint Satisfaction Problems (#CSP). This caps a long series of important results on counting problems including unweighted and weighted graph homomorphisms and the celebrated dichotomy theorem for unweighted #CSP. Our dichotomy theorem gives a succinct criterion for tractability. If a set F of constraint functions satisfies the criterion, then the #CSP problem defined by F is solvable in polynomial time; if F does not satisfy the criterion, then the problem is #P-hard. We furthermore show that the question of whether F satisfies the criterion is decidable in NP.

Surprisingly, our tractability criterion is simpler than the previous criteria for the more restricted classes of problems, although when specialized to those cases, they are logically equivalent. Our proof mainly uses Linear Algebra, and represents a departure from Universal Algebra, the dominant methodology in recent years.

Joint work with Jin-Yi Cai and Pinyan Lu.

Monday, June 6:
The (nearly) all-or-nothing nature of universally composable security
Mike Rosulek
University of Montana

Abstract: The universal composition (UC) framework is the most realistic model for the protocol security in complex settings like the Internet. However, it is also the most demanding; in fact, UC-secure protocols are impossible for many important tasks of interest. One way around these impossibility results is to consider allowing parties to access a trusted setup functionailty, such as a fair coin-tossing functionality. If all tasks do have a UC-secure protocol when allowed access to a particular trusted setup (as is the case for coin-tossing), we say that the setup is *complete*. An important question is to identify which setups are complete, or, more generally: given some description of a functionality, how can we tell how helpful is it as a trusted setup in the UC framework?

In this talk I'll discuss a very simple but powerful property called "splittability" that can be easily determined for arbitrary (even randomized and reactive) 2-party functionalities. Splittability can then be used to derive two important characterizations for the power of a functionality: (1) a complete characterization for which tasks have UC-secure protocols without trusted setups (such tasks are therefore useless themselves as a trusted setup), and (2) an almost-complete characterization for which tasks are complete. Comparing these two characterizations, we can see that nearly every functionality is either trivial or complete for UC security! The functionalities which are neither trivial nor complete appear to all be quite unnatural.

Some results discussed in this talk are joint work with Manoj Prabhakaran.

Thursday, June 16:
Property Testing Lower Bounds via Communication Complexity
Kevin Matulef
Tsinghua University

Abstract: We develop a new technique for proving lower bounds in property testing, by showing a strong connection between testing and communication complexity. We give a simple scheme for reducing communication problems to testing problems, thus allowing us to use known lower bounds in communication complexity to prove lower bounds in testing. This scheme is general and implies a number of new testing bounds, as well as simpler proofs of several known bounds. In this talk we will illustrate our technique by proving lower bounds for testing juntas, k-linearity, monotonicity, and more.

Based on joint work with Eric Blais and Joshua Brody.

Wednesday, July 13:
Pseudorandomness for read-once formulas
Andrew Wan
Tsinghua University

Abstract: We give an explicit construction of a pseudorandom generator for read-once formulas whose inputs can be read in arbitrary order. For formulas in n inputs and arbitrary gates of fan-in at most [;d = O(n/\log n);], the pseudorandom generator uses [;(1 - \Omega(1))n;] bits of randomness and produces an output that looks [;2^{-\Omega(n)};] pseudorandom to all such formulas.

The construction of explicit pseudorandom distributions for various models of computation is a central research program in computational complexity. Celebrated examples include generators that fool circuits of constant depth [AB84, AW85, Nis91, Bra10] and logarithmic space machines that read their input once and in a fixed, left-to-right order [Nis92, INW94]. Read-once boolean formulas are one of the simplest natural classes of computations for which no efficient construction of pseudorandom generators was previously known.

Joint work with Andrej Bogdanov and Periklis Papakonstantinou

Thursday, July 21:
On the Complexity of Efficiency-Revenue Trade-offs in Bayesian Auctions
Ilias Diakonikolas
UC Berkeley

Abstract: When agents with independent priors bid for a single item, Myerson’s optimal auction maximizes expected revenue, whereas Vickrey’s second-price auction optimizes social welfare. We address the natural question of trade-offs, auctions that optimize revenue without losing too much welfare, say. If one allows for randomized mechanisms, it is easy to see that there are polynomial-time mechanisms that achieve any point in the trade-off (the Pareto curve) between revenue and welfare. We ask the question of whether one can achieve the same guarantees using *deterministic* mechanisms. We provide a negative answer to this question by showing that this is an NP-hard problem. On the positive side, we provide polynomial-time deterministic mechanisms that approximate any point of the trade-off between these two fundamental objectives.

(Joint work with Christos Papadimitriou, George Pierrakos and Yaron Singer.)

Wednesday, August 10:
Extractors and lower bounds for locally samplable sources
Anindya De
UC Berkeley

Abstract: The talk deals with the following seemingly different problems :

- Extracting randomness from high entropy distribution sampled in NC0
- Constructing explicit distributions not samplable in NC0

Using (as a black box) a long line of work on affine extractors, we show that it is possible to extract approximately k2/n bits from any distribution sampled by a NC0 circuit which has min-entropy k as long as k>> n^{2/3} with inverse exponential error. Next, using this result, we construct an explicit function f : {0,1}^n -> {0,1} such that it is not possible to sample (U_n, f(U_n)) up to inverse exponential error (over random) by NC0 circuits. The previous best result by Viola (FOCS 2010) could only show a function f : {0,1}^n -> {0,1} such that it is not possible to sample (U_n, f(U_n)) up to inverse logarithmic error.

Joint work with Thomas Watson (Berkeley) and to appear in RANDOM 2011.

Contact tal-at-cs-dot-columbia-dot-edu if you want to volunteer to give a talk (especially for students!). The talk can be about your or others' work. It can be anything from a polished presentation of a completed result, to an informal black-board presentation of an interesting topic where you are stuck on some open problem. It should be accessible to a general theory audience. I will be happy to help students choose papers to talk about.

There is a mailing list for the reading group. General information about the mailing list (including how to subscribe yourself to it) is available here. If you want to unsubscribe or change your options, send email to theoryread-request@lists.cs.columbia.edu with the word `help' in the subject or body (don't include the quotes), and you will get back a message with instructions.

Comments on this page are welcome; please send them to tal-at-cs.columbia.edu

Last updated 02/24/2011.

Back to Theory of Computation at Columbia main page