Columbia Theory Reading Group, Spring 2006

Abstracts for the talks are given below. See the main page for schedule and location information.

Thursday, January 19th:
Strategic Models for Information Flow in Social Networks and Peer-to-Peer Systems
Jon Kleinberg
Cornell University

The concurrent growth of on-line communities exhibiting large-scale social structure, and of large decentralized peer-to-peer file-sharing systems, has stimulated recent interest in understanding networks of interacting agents as economic systems. Motivated by such systems, we study some general classes of networks in which users are provided with incentives to send or receive information.

We focus primarily on query incentive networks, in which users seeking information or services can pose queries, together with incentives for answering them, that are propagated along paths in a network. Using analysis based on branching processes, we find that the network operates efficiently when it is above a critical ``effective branching factor,'' but below this critical point the sizes of query incentives needed to extract information from the network grow dramatically.

We also briefly discuss some recent empirical studies of recommendation incentive networks, a ``dual'' type of system in which information is unsolicited rather than explicitly sought: users are offered incentives to pass recommendations to their network neighbors. We analyze the types of collective behavior that result, using data from a large on-line retailer.

This is based on joint work with Prabhakar Raghavan, Jure Leskovec, and Ajit Singh.

Monday, January 23rd:
Knuth Lecture: Probability and Recursion
Mihalis Yannakakis
Columbia University

We discuss recent work on the algorithmic analysis of systems involving recursion and probability. Recursive Markov chains extend ordinary finite state Markov chains with the ability to invoke other Markov chains in a potentially recursive manner. They offer a natural abstract model of probabilistic programs with procedures, and generalize other classical well-studied stochastic models, eg. multi-type Branching Processes and Stochastic Context-free Grammars. Recursive Markov Decision Processes and Recursive Simple Stochastic Games similarly extend ordinary finite Markov decision processes and stochastic games, and they are natural models for recursive systems involving both probabilistic and nonprobabilistic actions. In a series of recent papers with Kousha Etessami (U. of Edinburgh), we have introduced these models and studied central algorithmic problems regarding questions of termination, reachability,and analysis of the properties of their executions. In this talk we will present some of the basic theory and algorithms.

Thursday, January 26th:
Classification and Partitioning: Recent Bounds
Seffi Naor
Microsoft Research and Computer Science Dept., Technion

The Metric Labeling problem is an elegant and powerful mathematical model capturing a wide range of classification problems that arise in computer vision and related fields. In a typical classification problem, one wishes to assign labels to a set of objects to optimize some measure of the quality of the labeling. The metric labeling problem captures a broad range of classification problems where the quality of a labeling depends on the pairwise relations between the underlying set of objects, as described by a weighted graph. Additionally, a metric distance function on the labels is defined, and for each label and each vertex, an assignment cost is given. The goal is to find a minimum-cost assignment of the vertices to the labels. The cost of the solution consists of two parts: the assignment costs of the vertices and the separation costs of the edges (each edge pays its weight times the distance between the two labels to which its endpoints are assigned). Metric labeling has many applications as well as rich connections to some well known problems in combinatorial optimization. The balanced metric labeling problem has an additional constraint which upper bounds the number of vertices that can be assigned to each label, thus capturing various well-known balanced graph partitioning problems.

In the talk I will discuss the rich body of work in approximation algorithms, as well as lower bounds on approximability, that has been developed in recent years for the metric labeling problem and its variants.

Tuesday, February 14th:
A relative-error CUR decomposition for matrices and its data applications
Petros Drineas
Resselaer Polytechnic Institute

The Singular Value Decomposition (SVD) of matrices (and the related Principal Components Analysis) has found numerous applications in extracting structure from datasets in various domains, e.g., the social sciences, biology, chemistry, medicine, Internet data, etc. The extracted structure is expressed in terms of singular vectors, which are linear combinations of all the input data and might lack an intuitive physical interpretation. In this talk we shall discuss a matrix decomposition of the form CUR which, given an m-by-n matrix A, approximates A by the product CUR, where C consists of a few columns of A, R consists of a few rows of A, and U is a carefully constructed, constant-sized matrix. Such decompositions might be easier to interpret in applications, since both C and R are subsets of the data. Previous matrix decompositions of this form only guaranteed additive error approximations. Our construction for C, U, and R takes O(min{m^2n, mn^2}) time and guarantees that the Frobenius of the error matrix A-CUR is almost optimal in a relative error sense.

Finally, we will sketch applications of this algorithm in the analysis of human genotype data.

Wednesday, March 8th:
XML compression and indexing
Paolo Ferragina
Department of Informatica, University of Pisa, Italy

We present a novel approach to the problem of succinct manipulation of labeled trees by designing what we call the XBW-transform of trees, in the spirit of the well-known Burrows-Wheeler transform for strings. XBW-transform uses path-sorting and grouping to linearize a labeled tree T into two coordinated arrays, one capturing the structure and the other the labels. Using the properties of the XBW-transform, we (i) derive the first-known (near-)optimal results for succinct representation of labeled trees with constant-time navigation operations, (ii) optimally support the powerful subpath-search operation for the first time, and (iii) introduce a notion of tree entropy and present linear time algorithms for compressing a given labeled tree up to its entropy beyond the information-theoretic lower bound averaged over all tree inputs. Our XBW-transform is simple and likely to spur new results in the theory of tree compression and indexing. We also present a prototype implementation of the XBW-transform and apply it to the compression and indexing of XML documents.

Joint work with F. Luccio, G. Manzini and S. Muthukrishnan.

Tuesday, March 21st:
Learning on Groups and the Permceptron
Risi Kondor
Columbia University

I present a general framework for regularization and learning on finite groups, with particular emphasis on the symmetric group and its relevance to learning permutations and rankings. I'll relate this work to the theory of harmonic analysis on non-Abelian groups and recent results on fast Fourier transforms. Remarkably, the perceptron has a variant that can leverage non-Abelian FFTs and address permutation learning problems beyond the reach of other algorithms. This is what I call the "Permceptron".

Joint work with Tony Jebara

Monday, March 27th:
Expanders and the Elliptic Curve DLOG problem
David Jao
Microsoft Research, Cryptography and Anti-Piracy Group

I will talk about a family of graphs which originally arose in cryptography, in studying the difficulty of the discrete logarithm problem on elliptic curves.

These graphs can be shown to be expanders, assuming the generalized Riemann hypothesis (GRH) for Hecke's grossencharacter L-functions. From this one can deduce a random reducibility result for the discrete logarithm problem on the types of elliptic curves that are used in cryptographic applications. The rapid mixing of the random walk on these graphs had previously been assumed in a number of recent attacks on elliptic curve cryptosystems. One application of our estimates of the graph eigenvalues is to give useful, explicit bounds for these mixing times. The graphs themselves represent a new (conditional) construction of expanders, which can be used for other applications.

Joint work with Stephen Miller, Rutgers University and Hebrew University, and Ramarathnam Venkatesan, Microsoft Research Cryptography and Anti-Piracy Group.

Tuesday, April 4th:
Cost Shares and Approximation Algorithms for Network Design Problems
Lisa Fleischer
IBM T.J. Watson Research Center

Cost shares are a well-known concept from the realm of mechanism design. Recently, cost shares have also proved to be a useful tool in analyzing approximation algorithms for various network design problems. We build on this work by introducing very simple cost shares for Steiner forest problems, derived from primal-dual approximation algorithms. The result is simpler analyses and better approximation guarantees for two interesting network design problems, one modeling economies of scale (multicommodity rent-or-buy), the second modeling uncertainties in demand (stochastic Steiner tree). In this talk, I'll review the most relevant previous work and then discuss our new results.

This is joint work with Jochen Konemann, Stefano Leonardi, and Guido Schaefer.

Tuesday, April 4th:
Low-distortion Graph Spanners
Seth Pettie
Max Planck Institut fur Informatik

A spanner H of an undirected, unweighted graph G is a subgraph such that the distance between two points w.r.t. H is not too far from their distance in G, where "not too far" is captured by a distortion function f. Specifically, H is an f-spanner if dist_H(u,v) is at most f(dist_G(u,v)). Nearly all work on spanners and related structures considered only multiplicative distortion, and, in general, provided an undesirable tradeoff between the size of H and the coefficient of distortion.

The "holy grail" question in this area is whether there exist arbitrarily sparse spanners whose distortion is additive and constant. It was known that any graph has an additive 2-spanner with O(n^{3/2}) edges. In this talk I'll present a completely new method for constructing spanners based on an economics-inspired view of the problem. The main result is an additive 6-spanner with O(n^{4/3}) edges. In addition, the technique leads to a spectrum of arbitrarily sparse spanners whose distortion is additive and sublinear in the distance being approximated.

Tuesday, April 18th:
Hardness of Directed Congestion Minimization
Lisa Zhang
Bell Labs, Lucent Technologies

In this talk I'll present a new hardness result for the congestion minimization problem for directed graphs. In particular, we show there is no (log M)^{1-epsilon} approximation under a certain complexity assumption, where M is the number of edges in the network. This almost matches the log M/loglog M upperbound via randomized rounding, due to Raghavan and Thompson.

Time permitting, I'll discuss how recent progress in understanding the complexity of classic problems such as edge-disjoint paths and congestion minimization apply to practical issues arising in optical network optimization.

Tuesday, May 2nd:
Provably Near-Optimal Sampling-Based Policies for Stochastic Inventory Control Models
Retsef Levi
IBM TJ Watson Research Center

We consider two fundamental inventory models, the single-period newsvendor problem and its multi-period extension, but under the assumption that the explicit demand distributions are not known and that the only information available is a set of independent samples drawn from the true distributions. Under the assumption that the demand distributions are given explicitly, these models are well-studied and usually are relatively easy to solve. However, in most real-life scenarios, the true demand distributions are not available or they are too complex to work with. Thus, a sampling-driven algorithmic framework is very attractive, both in practice and theoretically.

In the talk we shall describe how to compute sampling-based policies, that is, policies that are computed based only on observed samples of the demands without any access to, or assumptions on, the true demand distributions. Moreover, we establish bounds on the number of samples required to guarantee that with high probability, the expected cost of the sampling-based policies is arbitrarily close (i.e., with arbitrarily small relative error) compared to the expected cost of an optimal policy which has full access to the demand distributions. The bounds that we develop are general, easy to compute and surprisingly do not depend at all on the specific demand distributions.

Joint work with Robin Roundy and David Shmoys

Thursday, May 18th:
Fast Leader-Election Protocols with Bounded Cheaters' Edge
Spyros Antonakopoulos
Columbia University

We study the leader election problem on $n$ players in the asynchronous full-information model. Our main contention is that the most commonly used performance measure for leader-election protocols, called resilience, is unable to discern whether a small number of players can exercise disproportionate influence on the outcome of a protocol, or not. As a remedy we propose a new quantity, named cheaters' edge, which roughly describes by what multiplicative factor malicious players may increase, through cheating, their probability of getting elected. Arguably, a good protocol must have bounded cheaters' edge.

We present polynomial-time constructions of new leader-election protocols that are fast, in terms of the rounds required (5, 5 log n, and polylog(n) rounds, respectively), but moreover exhibit bounded cheaters' edge under progressively looser restrictions on the number t of malicious players: t <= Theta(n / log n), t <= Theta(n / sqrt(log n log log n))$, and, eventually, no restriction at all---without relying on any a priori knowledge of t. The latter of these three protocols constitutes the first constructive solution to a problem posed by Alon and Naor more than a decade ago.

Back to Spring 2006 Theory Seminar main page