Columbia Theory Seminar, Spring 2012

For Spring 2012, the usual time for the meetings will be Friday at 12:00 - 13:30 in the CS conference room, CSB 453. Abstracts for talks are given below the talk schedule.

  • Wednesday, January 18, 3:00pm, CSB 453 (note unusual date and time): Arnab Bhattacharyya: Testing Low Complexity Affine-Invariant Properties (Abstract)

  • Friday, February 3, 12:00pm, CSB 453: Olga Ohrimenko: Privacy-Preserving Group Data Access via Stateless Oblivious RAM Simulation (Abstract)

  • Friday, February 10, 12:00pm, CSB 453: Geetha Jagannathan: Differentially-Private Learning of Low Dimensional Manifolds (Abstract)

  • Friday, February 17, 12:00pm, CSB 453: Rong Ge: Computing a Nonnegative Matrix Factorization -- Provably (Abstract)

  • Friday, February 24, 12:00pm, CSB 453: Ilias Diakonikolas: Learning (and Testing) Structured Distributions (Abstract)

  • Friday, March 2, 12:00pm, CSB 453: Christos Papadimitriou: Computational Insights and the Theory of Evolution (Abstract)

  • Friday, March 23, 12:00pm, CSB 453: Pablo Azar: Rational Proof Systems and Scoring Rules (Abstract)

  • Friday, April 6, 12:00pm, CSB 453: Ilias Diakonikolas: Reconstructing Boolean Threshold Functions from their Average Satisfying Assignment (Abstract)

  • Friday, April 13, 12:00pm, CSB 453: Nir Ailon: Efficient Adaptive Querying Strategies for Clustering and Ordering Problems (Abstract)

  • Friday, April 20, 12:00pm, CSB 453: Stavros Kolliopoulos: Planar Disjoint-Paths Completion (Abstract)

  • Tuesday, May 8, 12:00pm, CSB 477 (Open meeting area) (note unusual date and place): Anna Lysyanskaya: Tamper and Leakage Resilience in the Split-State Model (Abstract)

  • Tuesday, May 15, 2:00pm, CSB 477 (Open meeting area) (note unusual time, date, and place): Adam O'Neill: On Security Notions for Functional Encryption (Abstract)

  • Thursday, May 17, 11:00am, CSB 453 (note unusual time and date): Nisheeth Vishnoi: Solve, Walk, Cut (Abstract)

  • Friday, May 18, 12:00pm, CSB 453: Klim Efremenko: From Irreducible Representations to Locally Decodable Codes (Abstract)

  • Wednesday, May 23, 2:00pm, CSB 453 (note unusual time and date): Shengyu Zhang: On the Complexity of Trial and Error (Abstract)

  • Monday, July 2, 1:00pm, CSB 453 (note unusual time and date): Anindya De: The Inverse Shapley Value Problem (Abstract)

  • Talk Abstracts

    Wednesday, January 18:
    Testing Low Complexity Affine-Invariant Properties
    Arnab Bhattacharyya
    Princeton University & CCI

    Abstract: Invariance with respect to linear or affine transformations of the domain is arguably the most common symmetry exhibited by natural algebraic properties. Here, we will show that any low complexity affine-invariant property of multivariate functions over finite fields is testable with a constant number of queries. This immediately reproves, for instance, that the Reed-Muller code over [;F_p;] of degree d [;<;] p is testable, with an argument that uses no detailed algebraic information about polynomials except that low degree is preserved by composition with affine maps.

    The complexity of an affine-invariant property P refers to the maximum complexity, as defined by Green and Tao (Ann. Math. 2008), of the sets of linear forms used to characterize P. A more precise statement of our main result is that for any fixed prime p [;\geq;]2 and fixed integer R [;\geq;] 2, any affine-invariant property P of functions f: [;F_p^n;] [;\rightarrow;] [R] is testable, assuming the complexity of the property is less than p. Our proof involves developing analogs of graph-theoretic techniques in an algebraic setting, using tools from higher-order Fourier analysis.

    Joint work with Eldar Fischer (Technion) and Shachar Lovett (IAS)

    Friday, February 3:
    Privacy-Preserving Group Data Access via Stateless Oblivious RAM Simulation
    Olga Ohrimenko
    Brown University

    Abstract: The ''cloud" is emerging as the next generation of data storage where users can remotely keep their data and leave its management to a third party, e.g. Amazon S3 or Microsoft Azure. However, the fact that users no longer have physical possession of the data raises new challenges in terms of data privacy. Storing the data in encrypted form is a key component in maintaining the privacy of the data. However, encrypting the data is not enough since information about the data may be leaked through the pattern in which users access the data. We show how to achieve efficient privacy-preserving data access using a combination of probabilistic encryption, which directly hides data values, and stateless oblivious RAM simulation, which hides the pattern of data accesses.

    In this talk we propose a method with O(log n) worst-case access overhead for simulating users' requests to outsourced data of size n, using a scheme that is data-oblivious with very high probability. We assume that the simulation has access to a small private workspace but does not maintain state in between data access requests. Our simulation makes use of pseudo-random hash functions and is based on a novel hierarchy of cuckoo hash tables that all share a common stash. The method outperforms all previous techniques for stateless clients in terms of access overhead.

    Existing oblivious storage solutions typically involve small amortized overheads for obscuring the access pattern, but nevertheless involve potentially huge variations in access times, depending on when they occur. To this end, we show how to de-amortize oblivious storage simulations so that each request takes a worst-case bounded amount of time.

    We also provide experimental results from a prototype implementation of our scheme. We show that the involved constants are small resulting in at most 2log n round trips for each client request. Finally, we demonstrate that the performance of our stateless scheme is comparable to a more powerful scheme where a client is allowed to keep a state.

    Joint work with: Michael T. Goodrich, Michael Mitzenmacher and Roberto Tamassia

    Friday, February 10:
    Differentially-Private Learning of Low Dimensional Manifolds
    Geetha Jagannathan
    Columbia University

    Abstract: We study the problem of differentially-private learning of low dimensional manifolds embedded in high dimensional space. We achieve the dual goals of learning the manifold while maintaining the privacy of the dataset by constructing a differentially-private data structure that adapts to the intrinsic dimension of the dataset. Our differentially-private manifold learning algorithm is based on random projection trees of Freund et al. A naive construction of differentially-private random projection trees would involve queries with high global sensitivity that will affect the usefulness of the trees. Instead, we present approximations of such queries. Our approximations have low sensitivity and are precise enough for learning the low dimensional manifold. We prove that the size of the tree depends only on the intrinsic dimension of the dataset.

    Joint work with Krzysztof Choromanski and Claire Monteleoni

    Friday, February 17:
    Computing a Nonnegative Matrix Factorization -- Provably
    Rong Ge
    Princeton University

    Abstract: In the Nonnegative Matrix Factorization (NMF) problem we are given an [;n \times m;] nonnegative matrix M and an integer r > 0. Our goal is to express M as AW where A and W are nonnegative matrices of size [;n \times r;] and [;r \times m;] respectively. In some applications, it makes sense to ask instead for the product AW to approximate M, i.e. (approximately) minimize [;||M-AW||_F;] where [;||\cdot||_F;] denotes the Frobenius norm; we refer to this as Approximate NMF. This problem has a rich history spanning quantum mechanics, probability theory, data analysis, polyhedral combinatorics, communication complexity, demography, chemometrics, etc. In the past decade NMF has become enormously popular in machine learning, where A and W are computed using a variety of local search heuristics. Vavasis proved that this problem is NP-complete. We initiate a study of when this problem is solvable in polynomial time:

    1. We give a polynomial-time algorithm for exact and approximate NMF for every constant r. Indeed NMF is most interesting in applications precisely when r is small.

    2. We complement this with a hardness result, that if exact NMF can be solved in time [;(nm)^{o(r)};], 3-SAT has a sub-exponential time algorithm. This rules out substantial improvements to the above algorithm.

    3. We give an algorithm that runs in time polynomial in n, m and r under the separablity condition identified by Donoho and Stodden in 2003. The algorithm may be practical since it is simple and noise tolerant (under benign assumptions). Separability is believed to hold in many practical settings. To the best of our knowledge, this last result is the first example of a polynomial-time algorithm that provably works under a non-trivial condition on the input and we believe that this will be an interesting and important direction for future work

    Joint work with Sanjeev Arora, Ravi Kannan, and Ankur Moitra

    Friday, February 24:
    Learning (and Testing) Structured Distributions
    Ilias Diakonikolas
    UC Berkeley

    Abstract: I will survey recent work (joint with Costis Daskalakis and Rocco Servedio) on efficiently learning (and testing) certain natural classes of discrete distributions. Given time, I will also highlight some of the interesting research directio

    Friday, March 2:
    Computational Insights and the Theory of Evolution
    Christos Papadimitriou
    UC Berkeley

    Abstract: I shall discuss recent work (much of it joint with biologists Adi Livnat and Marcus Feldman) on some central problems in Evolution that was inspired and informed by computational ideas. Considerations about the performance of genetic algorithms led to a novel theory on the role of sex in Evolution based on the concept of mixability. And a natural random process on Boolean functions can help us understand better Waddington’s genetic assimilation phenomenon, in which an acquired trait becomes genetic.

    Friday, March 23:
    Rational Proof Systems and Scoring Rules
    Pablo Azar

    Abstract: We study a new type of proof system, where an unbounded prover and a polynomial time verifier interact, on inputs a string x and a function f , so that the Verifier may learn f (x). The novelty of our setting is that there no longer are "good" or "malicious" provers, but only rational ones. In essence, the Verifier has a budget c and gives the Prover a reward r [;\in;] [0, c] determined by the transcript of their interaction; the prover wishes to maximize his expected reward; and his reward is maximized only if he the verifier correctly learns f(x).

    Rational proof systems are as powerful as their classical counterparts for polynomially many rounds of interaction, but are much more powerful when we only allow a constant number of rounds. Indeed, we prove that if f [;\in;] #P , then f is computable by a one-round rational Merlin- Arthur game, where, on input x, Merlin's single message actually consists of sending just the value f (x). Further, we prove that CH, the counting hierarchy, coincides with the class of languages computable by a constant-round rational Merlin-Arthur game.

    Our results rely on a basic and crucial connection between rational proof systems and proper scoring rules, a tool developed to elicit truthful information from experts. Guided by our interest in verifiers who are computationally limited, we make two contributions to the theory of proper scoring rules. First, we prove that any deterministic, bounded scoring rule must make a number of queries to its input distribution that is linear in the number of states of the world. When this number of states is large, this is computationally infeasible. Second, we show a new generalization of scoring rules, sampling scoring rules, which allow the verifier to compute them using only two queries to the input distribution. The proof of our lower bound on the number of samples leverages Turan's theorem, a classical result in extremal graph theory.

    Friday, April 6:
    Reconstructing Boolean Threshold Functions from their Average Satisfying Assignment
    Ilias Diakonikolas
    UC Berkeley

    Abstract: In the 2nd FOCS conference (1961) C.-K. Chow gave a surprising characterization of Boolean threshold functions (halfspaces). Among all Boolean functions, each threshold function f is uniquely determined by the "center of mass" of its positive inputs, and the number of positive inputs. These n+1 parameters of f (known since as "Chow parameters") are equivalent to its degree-0 and degree-1 Fourier coefficients.

    The proof of Chow's theorem is highly non-constructive. We address the algorithmic problem of efficiently reconstructing a threshold function from its Chow parameters. This problem arises in various contexts and has been considered by researchers in electrical engineering, game theory, social choice and learning.

    I will describe a nearly-optimal algorithm to approximately reconstruct a threshold function from its Chow parameters. The algorithm and its analysis use tools from Fourier analysis, linear algebra, probability and learning.

    The talk will be based on joint work with Anindya De, Vitaly Feldman and Rocco Servedio (to appear in STOC'12).

    Friday, April 13:
    Efficient Adaptive Querying Strategies for Clustering and Ordering Problems
    Nir Ailonbr

    Abstract: I will discuss two learning problems for which iid sampling techniques, as well as known adaptive sampling techniques developed for general "active learning" problems are grossly suboptimal. The two problems are k-Correlation Clustering, and Minimum Feedback Arc-Set in Tournaments. In both problems the instance space is the set of pairs of elements over a ground set V of n elements. In the first, the objective is to learn a partition of V into k disjoint sets, in other words, to learn a graph containing k disjoint cliques covering V. In the second, the objective is to learn an ordering of V, in other words, a transitive tournament graph. Our setting is agnostic in the sense that the observations are noisy, and we compete with the best fit. Our sampling strategy is iterative, producing an improved solution at each iteration. Each "current" solution defines a distribution, from which we sample a batch of (noisy) edges. The underlying empirical process allows uniform estimation of the difference between the cost of any solution and the current one to within an error of epsilon times the distance between the solutions, for any epsilon. Minimizing this empirical process defines the next solution, and allows a fast learning rate.

    I will describe this strategy and show some connections with the theory of matrix completion.

    Joint work with Esther Ezra and Ron Begleiter

    Friday, April 20:
    Planar Disjoint-Paths Completion
    Stavros Kolliopoulos
    University of Athens

    Abstract: Take any graph property represented by a collection P of graphs. The corresponding completion problem typically asks for the minimum number of edges to add to a graph so that it belongs to P.

    We introduce the completion version of Disjoint Paths on planar graphs. Given a plane graph G, k pairs of terminals, and a face F of G, find the minimum set of edges, if one exists, to be added inside F so that: the embedding remains planar and the pairs become connected by k disjoint paths in the augmented network.

    We give an explicit upper bound on the number of additional edges needed if a solution exists. This bound is a function of k, independent of the size n of G. Second, we show that the problem is fixed-parameter tractable, i.e., it can be solved in time f(k)poly(n).

    Joint work with Isolde Adler and Dimitrios Thilikos

    Tuesday, May 8:
    Tamper and Leakage Resilience in the Split-State Model
    Anna Lysyanskaya
    Brown University

    Abstract: It is notoriously difficult to create hardware that is immune from side channel and tampering attacks. A lot of recent literature, therefore, has instead considered algorithmic defenses from such attacks.

    We show how to algorithmically secure any cryptographic functionality from continual split-state leakage and tampering attacks. A split-state attack on cryptographic hardware is one that targets separate parts of the hardware separately. Our construction does not require the hardware to have access to randomness. In contrast, prior work on protecting from continual combined leakage and tampering required true randomness for each update. Our construction is in the common reference string (CRS) model; the CRS must be hard-wired into the device. We note that prior negative results show that it is impossible to algorithmically secure a cryptographic functionality against a combination of arbitrary continual leakage and tampering attacks without true randomness; therefore restricting our attention to the split-state model is justified.

    Our construction is simple and modular, and relies on a new construction, in the CRS model, of non-malleable codes with respect to split-state tampering functions, which may be of independent interest.

    Joint work with Feng-Hao Liu

    Tuesday, May 15:
    On Security Notions for Functional Encryption
    Adam O'Neill
    Boston University

    Abstract: We introduce various security notions for functional encryption (FE), a concept put forth by Boneh, Sahai, and Waters (BSW). We study relations among these notions and their achievability.

    First, we show that indistinguishability (IND) and semantic security (SS) based notions of security are inequivalent for FE in general, in contrast to the classical setting of public-key encryption. Then, extending a result of BSW, we show that the SS notion is impossible to achieve due to its implicit incorporation of key-revealing selective-opening attacks (SOA-K). Finally, we put forth several relaxations of the SS notion to capture sans-SOA-K semantic security and show they are equivalent to IND for important functionalities including existing forms of predicate encryption (for which positive results under IND are known).

    Parts based on joint work with Mihir Bellare and Srdjan Krstic.

    Thursday, May 17:
    Solve, Walk, Cut
    Nisheeth Vishnoi
    Microsoft Research India

    Abstract: In this talk I will address the following question:

    Can one speed-up simulation of random walks given the ability to solve a system of linear equations quickly?

    Computing probability distributions that arise from random walks on undirected graphs quickly is a primitive useful in many combinatorial and algorithmic settings and, more recently, have been used in solving certain convex programs combinatorially.

    The answer to the above question will be revealed via an intricate story which stitches together methods from spectral graph theory, numerical linear algebra and approximation theory.

    As an application, I will highlight the first near-linear time approximation algorithm, that achieves the Cheeger bound, to cut a graph into two roughly equal parts while minimizing the conductance of the partition.

    The talk will be based on a joint work with Lorenzo Orecchia and Sushant Sachdeva and is available at

    Friday, May 18:
    From Irreducible Representations to Locally Decodable Codes
    Klim Efremenko
    Tel-Aviv University

    Abstract: A q-query Locally Decodable Code (LDC) is an error-correcting code that allows to read any particular symbol of the message by reading only q symbols of the codeword.

    In this talk we present a new approach for the construction of LDCs from the representation theory. We show that if there exists an irreducible representation ([;\rho;], V) of the group G and q elements [;g_1;],[;g_2;],..., [;g_q;] in G such that there exists a linear combination of matrices [;\rho;]([;g_i;]) that is of rank one, then we can construct a q-query Locally Decodable Code C:V -> [;F^G;].

    We will show that both matching vector codes and Reed-Muller codes fall in this framework.

    No prior knowledge in representation theory will be assumed.

    Wednesday, May 23:
    On the Complexity of Trial and Error
    Shengyu Zhang
    Chinese University of Hong Kong

    Abstract: Motivated by certain applications from physics, biochemistry, economics, and computer science in which the objects under investigation are unknown or not directly accessible because of various limitations, we propose a trial-and-error model to examine search problems with unknown inputs. Given a search problem with a hidden input, we are asked to find a valid solution. The way to find such a solution is to propose candidate solutions, i.e., trials, and, using observed violations, i.e., errors, to prepare future proposals. In accordance with our motivating applications, we consider a fairly broad class of constraint satisfaction problems, and assume that errors are signaled by a verification oracle in the format of the index of a violated constraint (with the exact content of the constraint still hidden). The objective is to design time- and/or trial-efficient algorithms that will find a valid solution or alternatively, to show that the problem is intrinsically hard.

    On one hand, despite the seemingly very little information provided by the verification oracle, we show that efficient algorithms do exist for a number of important problems. For the Nash, Core, Stable Matching, and SAT problems, the unknown-input versions are as hard as the corresponding known-input versions, up to a factor of polynomial. We further conduct a closer study of the latter two problems and give almost tight bounds on their trial complexities. The techniques employed to prove these results vary considerably, including, e.g., order theory and the ellipsoid method with a strong separation oracle.

    On the other hand, there are problems whose complexities are substantially increased in the unknown-input model. For Graph Isomorphism and Group Isomorphism, in particular, although there are trial-efficient algorithms, no time-efficient algorithms exist (unless PH collapses and P = NP, respectively). These results also imply lower bounds on the tradeoff between time and trial complexities. The proofs use quite nonstandard reductions, in which an efficient simulator is carefully designed to simulate a desirable but computationally unaffordable oracle.

    Our model investigates the value of information, and our results demonstrate that the lack of input information can introduce various levels of extra difficulty. The model accommodates a wide range of combinatorial and algebraic structures, and exhibits intimate connections with (and we hope can also serve as a useful supplement to) certain existing learning and complexity theories.

    Monday, July 2:
    The Inverse Shapley Value Problem
    Anindya De
    UC Berkeley

    Abstract: For f a weighted voting scheme used by n voters to choose between two candidates, the n Shapley-Shubik Indices of f provide a measure of how much control each voter can exert over the overall outcome of the vote. Shapley-Shubik indices were introduced by Lloyd Shapley and Martin Shubik in 1954 and are widely studied in social choice theory as a measure of the "influence" of voters. The Inverse Shapley Value Problem is the problem of designing a weighted voting scheme which (approximately) achieves a desired input vector of values for the Shapley indices. Despite much interest in this problem no provably correct and efficient algorithm was known prior to our work.

    We give the first efficient algorithm with provable performance guarantees for the Inverse Shapley Value Problem. For any constant \eps > 0 our algorithm runs in fixed poly(n) time (the degree of the polynomial is independent of \eps) and has the following performance guarantee: given as input a vector of desired Shapley indices, if any "reasonable" weighted voting scheme (roughly, one in which the threshold is not too skewed) approximately matches the desired vector of values to within some small error, then our algorithm explicitly outputs a weighted voting scheme that achieves this vector of Shapley indices to within error \eps. If there is a "reasonable" voting scheme in which all voting weights are integers at most \poly(n) that approximately achieves the desired Shapley indices, then our algorithm runs in time \poly(n) and outputs a weighted voting scheme that achieves the target vector of Shapley indices to within error \eps=n^{-1/8}.

    Joint work with Ilias Diakonikolas and Rocco Servedio.

    Contact xichen-at-cs-dot-columbia-dot-edu if you want to volunteer to give a talk (especially for students!). The talk can be about your or others' work. It can be anything from a polished presentation of a completed result, to an informal black-board presentation of an interesting topic where you are stuck on some open problem. It should be accessible to a general theory audience. I will be happy to help students choose papers to talk about.
    There is a mailing list for the reading group. General information about the mailing list (including how to subscribe yourself to it) is available here. If you want to unsubscribe or change your options, send email to with the word `help' in the subject or body (don't include the quotes), and you will get back a message with instructions.

    Comments on this page are welcome; please send them to

    Back to Theory of Computation at Columbia main page