Columbia Theory Reading Group, Spring 2005

Abstracts for the talks are given below. See the main page for schedule and location information.

Thursday February 17:
Cryptography in NC^0 (Appelbaum, Ishai, Kushilevitz)
Homin Lee

We present Applebaum, Ishai, and Kushilevitz's paper on computing cryptographic primitives such as one-way functions and pseudorandom generators using NC0 circuits. The results make use of the machinery of randomizing polynomials developed by Ishai and Kushilevitz.

Tuesday February 22:
AdWords and Generalized Online Matching
Umesh Vazirani

Search engine companies, such as Google, Yahoo and MSN, have revolutionized the use of the Internet by individuals. Their dramatic revenues are supported by a second revolution: in the way businesses advertise to consumers.

I will talk about this lesser known revolution and the underlying computational problem it raises. This problem is an elegant generalization of the online bipartite matching problem. I will describe two optimal algorithms for this task, each achieving a competitive ratio of 1-1/e. The design of these algorithms is made possible by the new notion of tradeoff-revealing linear programs.

Joint work with Aranyak Mehta, Amin Saberi and Vijay Vazirani.

Thursday February 24:
Approximation Algorithms for Stochastic Optimization
Chaitanya Swamy

Stochastic optimization problems attempt to model uncertainty in the data by assuming that (part of) the input is specified by a probability distribution, rather than by deterministic data given in advance. We consider the well-studied paradigm of stochastic recourse models, where the uncertainty evolves through a series of stages and one can take decisions in each stage in response to the new information learned. We obtain the first approximation algorithms for a variety of 2-stage and k-stage stochastic integer optimization problems where the underlying random data is given by a "black box" and no restrictions are placed on the costs of the two stages: one can merely sample data from this distribution, but no direct information about the distributions is given.

Our results are based on two principal components. First, we give a fully polynomial approximation scheme for solving a broad class of 2-stage and k-stage linear programs, where k is not part of the input. This is based on reformulating the stochastic linear program, which has both an exponential number of variables and an exponential number of constraints, as a compact convex program, and adapting tools from convex optimization to solve the resulting program to near optimality. In doing so, a significant difficulty that we must overcome is that even evaluating the objective function of this convex program at a given point may be provably hard. Second, we give a rounding approach for stochastic integer programs that shows that an approximation algorithm for a deterministic analogue yields, with a small constant-factor loss, provably near-optimal solutions for the stochastic problem. Thus we obtain approximation algorithms for several stochastic problems, including the stochastic versions of the set cover, vertex cover, facility location, multicut (on trees) and multicommodity flow problems.

In this talk, I will focus mainly on the 2-stage problem and briefly discuss the ideas required to extend our results to the multi-stage setting.

This is joint work with David Shmoys.

Tuesday March 22:
Information Flow Decomposition for Network Coding
Christina Fragouli

Network coding is an emerging area across EE and CS. The main idea is that intermediate nodes in a network not only forward but also process the incoming information flows. The network code is the set of the operations that intermediate nodes perform. This modern application of coding to the theory and practice of communication networks raises novel and exciting research problems, and promises to have a significant impact in diverse areas that include multicasting, network monitoring, reliable delivery, resource sharing, efficient flow control and security.

In this talk we will start by introducing the information flow decomposition. This method shows that very different networks are equivalent from the coding point of view and allows to identify such equivalence classes. This decomposition not only gives structural properties for network coding but allows us to address several other related questions. These questions include, alphabet size bounds for network codes, throughput benefits by using network coding, and design of decentralized network coding algorithms.

Thursday March 24:
A* Search with Triangle Inequality
Andrew Goldberg

We study the problem of finding a shortest path between two given vertices in a directed graph. This is an important problem with many applications, including that of computing driving directions. We are interested in preprocessing the graph using a linear amount of extra space to store additional information, and using this information to answer shortest paths queries quickly.

We use A* search in combination with new distance lower-bounding techniques based on landmarks and triangle inequality. Our experiments show that on some graph classes, such as map graphs, the new algorithm works very well. In particular, we are able to solve the problem on the graph of North America road networks with 30 million vertices on a handheld device. In this talk we describe the algorithm, its implementation, and experimental performance.

Joint work with Chris Harrelson and Renato Werneck

Wednesday March 30:
Pseudothreshold or threshold? - Identifying quantum fault-tolerance thresholds
Krysta Svore

Being able to determine a true threshold value for quantum gates is critical for fault-tolerant quantum computing. We analyze and study pseudothresholds, that is, fault-tolerant threshold results which may not indicate the true threshold value, and their relation to more accurate fault-tolerance thresholds. We expand upon the definition of pseudothreshold and identify several properties of fault-tolerance studies that can lead to a pseudothreshold result. We show differences between a pseudothreshold and a more accurate threshold result for both classical and quantum error correcting codes. This difference can be more than an order of magnitude. Our goal is to emphasize the need for higher levels of concatenation and the consideration of more parameters in a threshold analysis to obtain a more reliable threshold result.

Joint work with Andrew Cross and Isaac Chuang, MIT

Thursday March 31:
Approximation Algorithms for Semidefinite Packing Problems with Applications to Max-Cut and Graph Coloring
David Phillips

We describe the semidefinite analog of the vector packing problem, and show that the semidefinite programming relaxations for MAXCUT [Goemans and Williamson (1995)] and graph coloring [Karger, Motwani and Sudan (1998)] are in this class of problems. We extend a method of Bienstock and Iyengar (2004) which was based on ideas from Nesterov (2003) to design an algorithm for computing $\epsilon$-approximate solutions for this class of semidefinite programs. Our algorithm is in the spirit of Klein and Lu (1996), and decreases the dependence of the run-time on $\epsilon$ from $\epsilon^{-2}$ to $\epsilon^{-1}$. For sparse graphs, our method is faster than the best specialized interior point methods. A significant feature of our method is that it treats both the MAXCUT and the graph coloring problem in a unified manner.

Thursday April 14:
Recursive Markov Chains
Mihalis Yannakakis

Recursive Markov Chains (RMC) extend ordinary finite state Markov chains with the ability to invoke other Markov chains in a potentially recursive manner. They provide a natural abstract model for probabilistic programs with procedures, and other systems involving recursion and probability. RMCs define a class of denumerable Markov chains with a rich theory generalizing that of other well-studied models, eg. Stochastic Context-free Grammars, and Multi-type Branching (Stochastic) Processes. In a series of recent papers with Kousha Etessami (U. of Edinburgh), we have introduced Recursive Markov chains and studied central algorithmic problems regarding questions of termination, reachability, and analysis of the properties of their executions. In this talk we will discuss some of the basic theory and algorithms.

Thursday April 21:
An Unconditional Study of Computational Zero Knowledge

We prove a number of general theorems about ZK, the class of problems possessing computational zero-knowledge proofs. Our results are *unconditional*, in contrast to most previous works on ZK, which rely on the assumption that one-way functions exist.

In particular, we show that:

- Honest-verifier ZK equals general ZK,

- Public-coin ZK equals private-coin ZK,

- ZK is closed under union, and

- ZK with imperfect completeness equals ZK with perfect completeness.

Our approach is to combine the conditional techniques previously used in the study of ZK with the unconditional techniques that we and others developed in the study of SZK, the class of problems possessing statistical zero-knowledge proofs. To enable this combination, we prove that every problem in ZK can be decomposed into a problem in SZK together with a set of instances from which a one-way function can be constructed.

Thursday April 28:
The Complexity of Online Memory Checking
Guy Rothblum

Suppose you want to store a large file on a remote and unreliable server. You would like to verify that your file has not been corrupted, so you store a small private (randomized)fingerprint'' of the file on your own computer. This is the setting for the well-studied authentication problem, and the size of the required private fingerprint is well understood. We study the problem of sub-linear authentication: suppose you would like to encode and store your file in a way that allows you to verify that it has not been corrupted, but without reading all of it. If you only want to read t bits of the file, how large does the size s of the fingerprint need to be? We define this problem formally, and show a tight lower bound on the relationship between s and t when the adversary is not computationally bounded, namely: s x t= omega(n) where n is the file size. This is an easier case of the online memory checking problem, introduced by blum, evans, gemmel, kannan and naor in 1991, and hence the same (tight) lower bound applies also to this problem.

It was previously shown that when the adversary is not computationally bounded, under the assumption that one-way functions exist, it is possible to construct much better online memory checkers and sub-linear authentication schemes. We show that the existence of one-way functions is also a necessary condition: even slightly breaking the s x t= omega(n) lower bound in a computational setting implies the existence of one-way functions.

Joint work with Moni Naor.

Thursday May 5:
Linear Problems and Linear Algorithms
Uri Rothblum

The class of linear problems is defined using formulae in the predicate language over ordered fields. This class contains, for example, linear programs, bounded variable integer programs, linear complementarity problems and rank computations of matrices. A class of linear algorithms is also defined - using arithmetic, comparisons and random selections. Two linear programs are equivalent if they have the same set of data-solution pairs. Similarly two linear algorithms are defined to be equivalent if they have the same set of input-output pairs. A linear problem is defined to completely solved by a linear algorithm if the set of data-solution pairs of the problem equals the set of input-output pairs of the algorithm. We show that the function "completely solved by" is one-to one and onto, that is a bijection, from the equivalence classes of linear problems to all equivalence classes of linear algorithms. The bijection is constructive in both directions. The result demonstrates that linear problems are on the boundary of the problems that can be solved by arithmetic.

Joint work with B. Curtis Eaves (Stanford University)

Thursday May 12:
Property Testing and Additive Approximation for Edge Deletion Problems
Noga Alon

A graph property is monotone if it is closed under deleting vertices and edges. It turns out that for any such property P, and any epsilon > 0, one can approximate efficiently the minimum number of edges that have to be deleted from an n-vertex input graph to get a graph that satisfies P, up to an additive error of epsilon n^2. On the other hand, for any dense monotone property, that is, a property for which there are graphs on n vertices with Omega (n^2) edges that satisfy it, it is NP-hard to approximate this minimum up to an additive error of n^{2-epsilon}, for any fixed positive epsilon. The first part is related to recent results on Graph Property Testing, the second part answers a question raised by Yannakakis in the 80s.

After a brief description of the basic notions of Graph Property Testing, I will discuss the results above. This is based on joint work with A. Shapira, and on joint work with A. Shapira and B. Sudakov.

Thursday May 26:
Balanced Allocations on Graphs
It is well known that if $n$ balls are inserted into $n$ bins, with high probability, the bin with maximum load contains $(1+ o(1)) \log n / \log\log n$ balls. Azar, Broder, Karlin, and Upfal showed that instead of choosing one bin, if $d \ge 2$ bins are chosen at random and the ball inserted into the least loaded of the $d$ bins, the maximum load reduces drastically to $\log\log n /\log d + O(1)$.
We study the two choice balls and bins process when balls are not allowed to choose any two random bins, but only bins that are connected by an edge in an underlying graph. We show that for $n$ balls and $n$ bins, if the graph is almost regular with degree $n^\epsilon$, where $\epsilon$ is not too small, the previous bounds on the maximum load continue to hold. Precisely, the maximum load is $\log \log n + O(1/\epsilon) + O(1)$. So even if the graph has degree $n^{\Omega(1/\log\log n)}$, the maximum load is $O(\log\log n)$. For general $\Delta$-regular graphs, we show that the maximum load is $\log\log n + O(\frac{\log n}{\log (\Delta/\log^4 n)}) + O(1)$ and also provide an almost matching lower bound of $\log \log n + \frac{\log n}{\log (\Delta \log n)}$. Further this does not hold for non-regular graphs even if the minimum degree is high.
Voecking showed that the maximum bin size with $d$ choice load balancing can be further improved to $\log\log n /d + O(1)$ by breaking ties to the left. This requires $d$ random bin choices. We show that such bounds can be achieved by making two random accesses and querying $d/2$ contiguous bins in each access. By grouping a sequence of $n$ bins into $2n/d$ groups, each of $d/2$ consecutive bins, if each ball chooses two groups at random and inserts the new ball into the least-loaded bin in the lesser loaded group, then the maximum load is $\log\log n/d + O(1)$ with high probability. Furthermore, it also turns out that this grouping into groups of size $d/2$ is also essential in achieving this bound, that is, instead of choosing two random groups, if we simply choose two random sets of $d/2$ consecutive bins, then the maximum load jumps to $\log\log n /\log d$.