For Spring 2012, the usual time for the meetings will be Friday at 12:00 - 13:30 in the CS conference room, CSB 453. Abstracts for talks are given below the talk schedule.

Arnab Bhattacharyya

Princeton University & CCI

**Abstract:**
Invariance with respect to linear or affine transformations of the
domain is arguably the most common symmetry exhibited by
natural algebraic properties. Here, we will show that any low
complexity affine-invariant property of multivariate functions
over finite fields is testable with a constant number of queries.
This immediately reproves, for instance, that the Reed-Muller
code over [;F_p;] of degree d [;<;] p is testable, with an argument
that uses no detailed algebraic information about polynomials
except that low degree is preserved by composition with affine
maps.

The complexity of an affine-invariant property P refers to the maximum complexity, as defined by Green and Tao (Ann. Math. 2008), of the sets of linear forms used to characterize P. A more precise statement of our main result is that for any fixed prime p [;\geq;]2 and fixed integer R [;\geq;] 2, any affine-invariant property P of functions f: [;F_p^n;] [;\rightarrow;] [R] is testable, assuming the complexity of the property is less than p. Our proof involves developing analogs of graph-theoretic techniques in an algebraic setting, using tools from higher-order Fourier analysis.

Joint work with Eldar Fischer (Technion) and Shachar Lovett (IAS)

Olga Ohrimenko

Brown University

**Abstract:**
The ''cloud" is emerging as the next generation of data storage where
users can remotely keep their data and leave its management to a third
party, e.g. Amazon S3 or Microsoft Azure. However, the fact that users
no longer have physical possession of the data raises new challenges
in terms of data privacy. Storing the data in encrypted form is a key
component in maintaining the privacy of the data. However, encrypting
the data is not enough since information about the data may be leaked
through the pattern in which users access the data. We show how to
achieve efficient privacy-preserving data access using a combination
of probabilistic encryption, which directly hides data values, and
stateless oblivious RAM simulation, which hides the pattern of data
accesses.

In this talk we propose a method with O(log n) worst-case access overhead for simulating users' requests to outsourced data of size n, using a scheme that is data-oblivious with very high probability. We assume that the simulation has access to a small private workspace but does not maintain state in between data access requests. Our simulation makes use of pseudo-random hash functions and is based on a novel hierarchy of cuckoo hash tables that all share a common stash. The method outperforms all previous techniques for stateless clients in terms of access overhead.

Existing oblivious storage solutions typically involve small amortized overheads for obscuring the access pattern, but nevertheless involve potentially huge variations in access times, depending on when they occur. To this end, we show how to de-amortize oblivious storage simulations so that each request takes a worst-case bounded amount of time.

We also provide experimental results from a prototype implementation of our scheme. We show that the involved constants are small resulting in at most 2log n round trips for each client request. Finally, we demonstrate that the performance of our stateless scheme is comparable to a more powerful scheme where a client is allowed to keep a state.

Joint work with: Michael T. Goodrich, Michael Mitzenmacher and Roberto Tamassia

Geetha Jagannathan

Columbia University

**Abstract:**
We study the problem of differentially-private learning of low
dimensional manifolds embedded in high dimensional space.
We achieve the dual goals of learning the manifold while
maintaining the privacy of the dataset by constructing a
differentially-private data structure that adapts to the intrinsic
dimension of the dataset. Our differentially-private manifold
learning algorithm is based on random projection trees of
Freund et al. A naive construction of differentially-private
random projection trees would involve queries with high global
sensitivity that will affect the usefulness of the trees. Instead,
we present approximations of such queries. Our approximations
have low sensitivity and are precise enough for learning the low
dimensional manifold. We prove that the size of the tree depends
only on the intrinsic dimension of the dataset.

Joint work with Krzysztof Choromanski and Claire Monteleoni

Rong Ge

Princeton University

**Abstract:**
In the Nonnegative Matrix Factorization (NMF) problem we are given
an [;n \times m;] nonnegative matrix M and an integer r > 0. Our
goal is to express M as AW where A and W are nonnegative
matrices of size [;n \times r;] and [;r \times m;] respectively. In some
applications, it makes sense to ask instead for the product AW to
approximate M, i.e. (approximately) minimize [;||M-AW||_F;]
where [;||\cdot||_F;] denotes the Frobenius norm; we refer to this as
Approximate NMF. This problem has a rich history spanning quantum
mechanics, probability theory, data analysis, polyhedral combinatorics,
communication complexity, demography, chemometrics, etc. In the past
decade NMF has become enormously popular in machine learning,
where A and W are computed using a variety of local search
heuristics. Vavasis proved that this problem is NP-complete. We
initiate a study of when this problem is solvable in polynomial time:

1. We give a polynomial-time algorithm for exact and approximate NMF for every constant r. Indeed NMF is most interesting in applications precisely when r is small.

2. We complement this with a hardness result, that if exact NMF can be solved in time [;(nm)^{o(r)};], 3-SAT has a sub-exponential time algorithm. This rules out substantial improvements to the above algorithm.

3. We give an algorithm that runs in time polynomial in n, m and r under the separablity condition identified by Donoho and Stodden in 2003. The algorithm may be practical since it is simple and noise tolerant (under benign assumptions). Separability is believed to hold in many practical settings. To the best of our knowledge, this last result is the first example of a polynomial-time algorithm that provably works under a non-trivial condition on the input and we believe that this will be an interesting and important direction for future work

Joint work with Sanjeev Arora, Ravi Kannan, and Ankur Moitra

Ilias Diakonikolas

UC Berkeley

**Abstract:**
I will survey recent work (joint with Costis Daskalakis and
Rocco Servedio) on efficiently learning (and testing) certain
natural classes of discrete distributions. Given time, I will
also highlight some of the interesting research directio

Christos Papadimitriou

UC Berkeley

**Abstract:**
I shall discuss recent work (much of it joint with biologists Adi Livnat
and Marcus Feldman) on some central problems in Evolution that was
inspired and informed by computational ideas. Considerations about
the performance of genetic algorithms led to a novel theory on the role
of sex in Evolution based on the concept of mixability. And a natural
random process on Boolean functions can help us understand better
Waddingtonâ€™s genetic assimilation phenomenon, in which an acquired
trait becomes genetic.

Pablo Azar

MIT

**Abstract:**
We study a new type of proof system, where an unbounded prover and a polynomial time
verifier interact, on inputs a string x and a function f , so that the Verifier may learn f (x).
The novelty of our setting is that there no longer are "good" or "malicious" provers, but only
rational ones. In essence, the Verifier has a budget c and gives the Prover a reward r [;\in;] [0, c]
determined by the transcript of their interaction; the prover wishes to maximize his expected
reward; and his reward is maximized only if he the verifier correctly learns f(x).

Rational proof systems are as powerful as their classical counterparts for polynomially many rounds of interaction, but are much more powerful when we only allow a constant number of rounds. Indeed, we prove that if f [;\in;] #P , then f is computable by a one-round rational Merlin- Arthur game, where, on input x, Merlin's single message actually consists of sending just the value f (x). Further, we prove that CH, the counting hierarchy, coincides with the class of languages computable by a constant-round rational Merlin-Arthur game.

Our results rely on a basic and crucial connection between rational proof systems and proper scoring rules, a tool developed to elicit truthful information from experts. Guided by our interest in verifiers who are computationally limited, we make two contributions to the theory of proper scoring rules. First, we prove that any deterministic, bounded scoring rule must make a number of queries to its input distribution that is linear in the number of states of the world. When this number of states is large, this is computationally infeasible. Second, we show a new generalization of scoring rules, sampling scoring rules, which allow the verifier to compute them using only two queries to the input distribution. The proof of our lower bound on the number of samples leverages Turan's theorem, a classical result in extremal graph theory.

Ilias Diakonikolas

UC Berkeley

**Abstract:**
In the 2nd FOCS conference (1961) C.-K. Chow gave a surprising
characterization of Boolean threshold functions (halfspaces). Among
all Boolean functions, each threshold function f is uniquely determined
by the "center of mass" of its positive inputs, and the number of positive
inputs. These n+1 parameters of f (known since as "Chow parameters")
are equivalent to its degree-0 and degree-1 Fourier coefficients.

The proof of Chow's theorem is highly non-constructive. We address the algorithmic problem of efficiently reconstructing a threshold function from its Chow parameters. This problem arises in various contexts and has been considered by researchers in electrical engineering, game theory, social choice and learning.

I will describe a nearly-optimal algorithm to approximately reconstruct a threshold function from its Chow parameters. The algorithm and its analysis use tools from Fourier analysis, linear algebra, probability and learning.

The talk will be based on joint work with Anindya De, Vitaly Feldman and Rocco Servedio (to appear in STOC'12).

Nir Ailonbr

Technion

**Abstract:**
I will discuss two learning problems for which iid sampling techniques,
as well as known adaptive sampling techniques developed for general
"active learning" problems are grossly suboptimal. The two problems
are k-Correlation Clustering, and Minimum Feedback Arc-Set in
Tournaments. In both problems the instance space is the set of pairs
of elements over a ground set V of n elements. In the first, the objective
is to learn a partition of V into k disjoint sets, in other words, to learn a
graph containing k disjoint cliques covering V. In the second, the
objective is to learn an ordering of V, in other words, a transitive
tournament graph. Our setting is agnostic in the sense that the
observations are noisy, and we compete with the best fit. Our sampling
strategy is iterative, producing an improved solution at each iteration.
Each "current" solution defines a distribution, from which we sample a
batch of (noisy) edges. The underlying empirical process allows
uniform estimation of the difference between the cost of any solution
and the current one to within an error of epsilon times the distance
between the solutions, for any epsilon. Minimizing this empirical
process defines the next solution, and allows a fast learning rate.

I will describe this strategy and show some connections with the theory of matrix completion.

Joint work with Esther Ezra and Ron Begleiter

Stavros Kolliopoulos

University of Athens

**Abstract:**
Take any graph property represented by a collection P of graphs. The
corresponding completion problem typically asks for the minimum number
of edges to add to a graph so that it belongs to P.

We introduce the completion version of Disjoint Paths on planar graphs. Given a plane graph G, k pairs of terminals, and a face F of G, find the minimum set of edges, if one exists, to be added inside F so that: the embedding remains planar and the pairs become connected by k disjoint paths in the augmented network.

We give an explicit upper bound on the number of additional edges needed if a solution exists. This bound is a function of k, independent of the size n of G. Second, we show that the problem is fixed-parameter tractable, i.e., it can be solved in time f(k)poly(n).

Joint work with Isolde Adler and Dimitrios Thilikos

Anna Lysyanskaya

Brown University

**Abstract:**
It is notoriously difficult to create hardware that is immune from side
channel and tampering attacks. A lot of recent literature, therefore,
has instead considered algorithmic defenses from such attacks.

We show how to algorithmically secure any cryptographic functionality from continual split-state leakage and tampering attacks. A split-state attack on cryptographic hardware is one that targets separate parts of the hardware separately. Our construction does not require the hardware to have access to randomness. In contrast, prior work on protecting from continual combined leakage and tampering required true randomness for each update. Our construction is in the common reference string (CRS) model; the CRS must be hard-wired into the device. We note that prior negative results show that it is impossible to algorithmically secure a cryptographic functionality against a combination of arbitrary continual leakage and tampering attacks without true randomness; therefore restricting our attention to the split-state model is justified.

Our construction is simple and modular, and relies on a new construction, in the CRS model, of non-malleable codes with respect to split-state tampering functions, which may be of independent interest.

Joint work with Feng-Hao Liu

Adam O'Neill

Boston University

**Abstract:** We introduce various security notions for functional
encryption (FE), a concept put forth by Boneh, Sahai, and Waters
(BSW). We study relations among these notions and their achievability.

First, we show that indistinguishability (IND) and semantic security
(SS) based notions of security are **inequivalent** for FE in general, in
contrast to the classical setting of public-key encryption. Then,
extending a result of BSW, we show that the SS notion is impossible to
achieve due to its implicit incorporation of key-revealing
selective-opening attacks (SOA-K). Finally, we put forth several
relaxations of the SS notion to capture sans-SOA-K semantic security
and show they are equivalent to IND for important functionalities
including existing forms of predicate encryption (for which positive
results under IND are known).

Parts based on joint work with Mihir Bellare and Srdjan Krstic.

Nisheeth Vishnoi

Microsoft Research India

**Abstract:** In this talk I will address the following question:

Can one speed-up simulation of random walks given the ability to solve a system of linear equations quickly?

Computing probability distributions that arise from random walks on undirected graphs quickly is a primitive useful in many combinatorial and algorithmic settings and, more recently, have been used in solving certain convex programs combinatorially.

The answer to the above question will be revealed via an intricate story which stitches together methods from spectral graph theory, numerical linear algebra and approximation theory.

As an application, I will highlight the first near-linear time approximation algorithm, that achieves the Cheeger bound, to cut a graph into two roughly equal parts while minimizing the conductance of the partition.

The talk will be based on a joint work with Lorenzo Orecchia and Sushant Sachdeva and is available at http://arxiv.org/abs/1111.1491.

Klim Efremenko

Tel-Aviv University

**Abstract:** A q-query Locally Decodable Code (LDC) is an error-correcting code that
allows to read any particular symbol of the message by reading only q
symbols of the codeword.

In this talk we present a new approach for the construction of LDCs from the representation theory. We show that if there exists an irreducible representation ([;\rho;], V) of the group G and q elements [;g_1;],[;g_2;],..., [;g_q;] in G such that there exists a linear combination of matrices [;\rho;]([;g_i;]) that is of rank one, then we can construct a q-query Locally Decodable Code C:V -> [;F^G;].

We will show that both matching vector codes and Reed-Muller codes fall in this framework.

No prior knowledge in representation theory will be assumed.

Shengyu Zhang

Chinese University of Hong Kong

**Abstract:** Motivated by certain applications from physics, biochemistry, economics, and
computer science in which the objects under investigation are unknown or not
directly accessible because of various limitations, we propose a
trial-and-error
model to examine search problems with *unknown* inputs. Given a search
problem with a hidden input, we are asked to find a valid solution. The way to
find such a solution is to propose candidate solutions, i.e.,
*trials*, and,
using observed violations, i.e., *errors*, to prepare future proposals. In
accordance with our motivating applications, we consider a fairly broad class
of constraint satisfaction problems, and assume that errors are signaled by a
verification oracle in the format of the index of a violated
constraint (with the
exact content of the constraint still hidden). The objective is to design time-
and/or trial-efficient algorithms that will find a valid solution or
alternatively, to
show that the problem is intrinsically hard.

On one hand, despite the seemingly very little information provided by the verification oracle, we show that efficient algorithms do exist for a number of important problems. For the Nash, Core, Stable Matching, and SAT problems, the unknown-input versions are as hard as the corresponding known-input versions, up to a factor of polynomial. We further conduct a closer study of the latter two problems and give almost tight bounds on their trial complexities. The techniques employed to prove these results vary considerably, including, e.g., order theory and the ellipsoid method with a strong separation oracle.

On the other hand, there are problems whose complexities are substantially increased in the unknown-input model. For Graph Isomorphism and Group Isomorphism, in particular, although there are trial-efficient algorithms, no time-efficient algorithms exist (unless PH collapses and P = NP, respectively). These results also imply lower bounds on the tradeoff between time and trial complexities. The proofs use quite nonstandard reductions, in which an efficient simulator is carefully designed to simulate a desirable but computationally unaffordable oracle.

Our model investigates the value of information, and our results demonstrate that the lack of input information can introduce various levels of extra difficulty. The model accommodates a wide range of combinatorial and algebraic structures, and exhibits intimate connections with (and we hope can also serve as a useful supplement to) certain existing learning and complexity theories.

Anindya De

UC Berkeley

**Abstract:** For f a weighted voting scheme used by n voters to choose between two candidates, the n Shapley-Shubik Indices of f provide a measure of how much control each voter can exert over the overall outcome of the vote. Shapley-Shubik indices were introduced by Lloyd Shapley and Martin Shubik in 1954 and are widely studied in social choice theory as a measure of the "influence" of voters. The Inverse Shapley Value Problem is the problem of designing a weighted voting scheme which (approximately) achieves a desired input vector of values for the Shapley indices. Despite much interest in this problem no provably correct and efficient algorithm was known prior to our work.

We give the first efficient algorithm with provable performance guarantees for the Inverse Shapley Value Problem. For any constant \eps > 0 our algorithm runs in fixed poly(n) time (the degree of the polynomial is independent of \eps) and has the following performance guarantee: given as input a vector of desired Shapley indices, if any "reasonable" weighted voting scheme (roughly, one in which the threshold is not too skewed) approximately matches the desired vector of values to within some small error, then our algorithm explicitly outputs a weighted voting scheme that achieves this vector of Shapley indices to within error \eps. If there is a "reasonable" voting scheme in which all voting weights are integers at most \poly(n) that approximately achieves the desired Shapley indices, then our algorithm runs in time \poly(n) and outputs a weighted voting scheme that achieves the target vector of Shapley indices to within error \eps=n^{-1/8}.

Joint work with Ilias Diakonikolas and Rocco Servedio.

Contact

There is a mailing list for the reading group. General information about the mailing list (including how to subscribe yourself to it) is available here. If you want to unsubscribe or change your options, send email to theoryread-request@lists.cs.columbia.edu with the word `help' in the subject or body (don't include the quotes), and you will get back a message with instructions.

Comments on this page are welcome; please send them to
` xichen-at-cs.columbia.edu
`