Columbia Theory Reading Group, Spring 2007

Abstracts for the talks are given below. See the main page for schedule and location information.

Friday, February 9:
Fast Johnson-Lindenstrauss Transform(s)
Nir Ailon
Institute for Advanced Study

A classic functional analytic result by Johnson and Lindenstrauss from 1984 implies that any Euclidean metric on n points can be represented using only k=(log n)/epsilon2 dimensions with distortion epsilon. In computer science, this result has been extensively used in design of algorithms suffering from large dependence of space or time in the dimensionality of the input (images are typically represented using thousands of dimensions).

In spite of the algorithmic usefulness of dimension reduction, and aside from various constant factor speedups and proof simplification results, very little was known about its complexity. In my talk I will present the first asymptotic improvement to the naive implementation as well as various applications and questions.

Based on joint work with Bernard Chazelle.

Friday, March 2:
Analysis of the Pollard Rho Algorithm for Discrete Logs
Stephen Miller
Rutgers University

The Polllard Rho algorithm for the discrete logarithm problem works by iterating a simple function.  When a collision of values is found, it can usually be converted into a solution.  We recently gave the first nontrivial rigorous runtime analysis, proving that collisions occur with high probability in O(sqrt(n)(log n)^3) on any group of prime order n.  When n satisfies a weak arithmetic condition, these collisions provably lead to a solution to the discrete logarithm problem with high probability.  The methods involve studying random walks directed graphs.  The guaranteed randomness of these walks can in turn be used to create an efficient stream cipher.

Friday, March 23:
An Adaptive Packed-Memory Array
Michael Bender
SUNY Stony Brook

In this talk we show how to maintain a dynamic set of N elements in sorted order in a O(N)-sized array in memory or on disk. The idea is to intersperse O(N) gaps so that only a small number of elements need to be moved on an insert or delete.

Because the elements are stored physically in sorted order in memory or on disk, the structure supports extremely efficient data scans or range queries.

We show how to maintain the structure with a small (polylogarithmic) number of element moves even in the worst case. We then present an adaptive structure that performs even better for common insertion patterns. We give theoretical analysis and experimental results.

Joint work with Haodong Hu.

Tuesday, March 27:
Congruences for Context-Free Languages and their Applications
Mahesh Viswanathan

Visibly pushdown languages are a special class of context-free languages that are appropriate for modelling recursive computation and XML documents. Visibly pushdown languages enjoy many of the robust closure properties of regular languages, like closure under interesection and complementation. In this talk we will present congruence-based characterization of visibly pushdown languages, much like the Myhill-Nerode theorem for regular languages. We will then discuss applications of such a characterization to machine learning, test suite generation, and model checking.

Joint work with Rajeev Alur, Viraj Kumar and P. Madhusudan.

Friday, April 13:
Flow Algorithms for Two Pipelined Filtering Problems
Lisa Hellerstein
Polytechnic University

We give related flow algorithms for two pipelined filter-ordering problems, both motivated by query optimization in databases. This work is also connected to sequential testing, minimum-sum set cover, and learning with attribute costs. The first problem is a throughput maximization problem for filter ordering. The second problem also concerns filter ordering, and can be described in terms of finding an optimal strategy for a player in a particular two-person zero-sum game.

In the first problem, there are n predicates with independent selectivities p_1, ..., p_n. Each predicate s_i is evaluated by a separate processor which can process r_i tuples per second. Each tuple is routed through the processors until either (1) it is found not to satisfy a predicate or (2) it is found to satisfy all predicates. The problem is to route tuples through processors so as to maximize the throughput of tuple processing. We present an algorithm that solves this problem in time O(n^2), improving on an earlier O(n^3 \log n) algorithm of Kodialam.

In the second problem, there are n predicates s_i, each with a cost c_i. The problem is to determine the (randomized) order in which to apply the predicates to a single tuple until the tuple has been found to fail a predicate. The goal is to minimize multiplicative regret -- the ratio between the cost incurred in processing the tuple, and the cost that would have been incurred if the predicate that the tuple fails (assuming there is one) had been tested first. A variant of the algorithm for the first problem solves this problem in time O(n^2).

This is joint work with Anne Condon, Amol Deshpande, and Ning Wu.

Friday, May 4:
A Cryptographic Study of Secure Internet Measurement
David Xiao
Princeton University

The Internet is an indispensable part of our information society, and yet its basic foundations remain vulnerable to simple attacks, and one area that remains especially susceptible to attack is routing. There have been increasing efforts in the networking community to incorporate security into current routing protocols, and secure Internet measurement is necessary to inform any routing protocol. In this talk we will show how to use theoretical tools to give a rigorous treatment of this security problem. We believe our work shows that rigorous techniques, and even tools for negative results such as reducing to one-way functions or black-box separations, can have an immediate impact on the study of security problems of real-world importance.

We describe two definitions for this problem: fault detection (FD) where the honest parties only want to know if the packets they sent were dropped or modified or not, and fault localization (FL) where the honest parties want to know in addition where exactly their packets were modified or dropped. Besides traditional per-packet definitions where we want to know the fate of every packet, we also propose *statistical* definitions that reduce the communication and storage overhead of protocols yet retain useful security properties. We will sketch constructions of schemes that satisfy our security definitions and have desirable practical properties.

Next, we show the negative results implied by our definitions. In particular, we can show the necessity of keys, cryptography, and storage in any secure FD or FL scheme. We will describe in detail the proof of our result that any secure black-box construction of a FL protocol requires cryptography to be performed at each nodes. This result uses a novel application of the black-box separation technique of Impagliazzo-Rudich and the learning algorithm of Naor-Rothblum.

This is joint work with Sharon Goldberg, Boaz Barak, and Jennifer Rexford.

Back to Fall 2006 Theory Seminar main page