For Fall 2009, the usual time for the meetings will be Fridays at 12:00-1:00pm in the CS conference room, CSB 453. Abstracts for talks are given below the talk schedule.
Abstract: Computing Set Intersection privately and efficiently between two mutually mistrusting parties is an important basic procedure in the area of private data mining. Assuring robustness, namely, coping with potentially arbitrarily misbehaving (i.e., malicious) parties, while retaining protocol efficiency (rather than employing costly generic techniques) is an open problem. In this work the first solution to this problem is presented.
Joint work with Dana Dachman-Soled, Tal Malkin, and Moti Yung
Abstract: We study the online stochastic bipartite matching problem, in a form motivated by display ad allocation on the Internet. In the online, but adversarial case, the celebrated result of Karp, Vazirani and Vazirani gives an approximation ratio of 1 - 1/e = 0.632, a very familiar bound that holds for many online problems; further, the bound is tight in this case. In the online, stochastic case when nodes are drawn repeatedly from a known distribution, the greedy algorithm matches this approximation ratio, but still, no algorithm is known that beats the 1 - 1/e bound.
Our main result is a 0.67-approximation online algorithm for stochastic bipartite matching, breaking this 1 - 1/e barrier. Furthermore, we show that no online algorithm can produce a 1-epsilon approximation for an arbitrarily small epsilon for this problem.
Our algorithms are based on computing an optimal offline solution to the expected instance, and using this solution as a guideline in the process of online allocation. We employ a novel application of the idea of the power of two choices from load balancing: we compute two disjoint solutions to the expected instance, and use both of them in the online algorithm in a prescribed preference order. To identify these two disjoint solutions, we solve a max flow problem in a boosted flow graph, and then carefully decompose this maximum flow to two edge-disjoint (near-)matchings. In addition to guiding the online decision making, these two offline solutions are used to characterize an upper bound for the optimum in any scenario. This is done by identifying a cut whose value we can bound under the arrival distribution.
Joint work with Aranyak Mehta, Vahab Mirrokni and S. Muthukrishnan.
Abstract: We extend recent results of Arthur and Vassilvitskii to provide a pseudo-approximation algorithm for the k-means clustering problem in the batch (non-streaming) setting. We then use this algorithm in an extended version of the divide-and-conquer strategy by Guha et al. to obtain a streaming algorithm for the k-means problem with good approximation/memory tradeoffs. More specifically, we first extend the k-means++ algorithm, due to Arthur and Vassilvitskii, which is a randomized algorithm that gives an O(log k) approximation to the k-means objective in the batch setting. Our new algorithm, is shown to give a constant approximation factor to the k-means objective, and outputs O(k log k) centers. We then provide a variant of Guha et al.'s algorithm for k-means in the one-pass (streaming) setting, resulting in an efficient algorithm which outputs exactly k centers and gives a tradeoff of O(n^a) memory for an O(c^(1/a) log(k)) approximation for any constant a<1 and some fixed constant c>1. Empirical evaluations on real and simulated data reveal the practical utility of our method. I will also talk about a quantitatively improved version of our first result obtained independently by Aggarwal et al. and some interesting open questions.
This is a joint work with Claire Monteleoni (CCLS, Columbia) and Nir Ailon (Google).
Abstract: Let F be the field of q elements. An (n,k)-affine extractor is a mapping D:Fn -> {0,1} such that for any k-dimensional affine subspace X ⊆ Fn, D(x) is an almost unbiased bit when x is chosen uniformly from X.
Loosely speaking, the problem of explicitly constructing affine extractors gets harder as q gets smaller and easier as k gets larger. This is reflected in previous results: When q is `large enough', specifically q=(n2), Gabizon and Raz construct affine extractors for any k ≥ 1. In the `hardest case', i.e. when q=2, Bourgain constructs affine extractors for k ≥ δ n for any constant (and even slightly sub-constant) δ >0. Our main result is the following: Fix any k ≥ 2 and let d=5n/k. Then whenever q> 2d^2 and p=char(F)>d, we give an explicit (n,k)-affine extractor. For example, when k= δn for constant δ>0, we get an extractor for a field of constant size Ω( (1/δ)^2).
Thus our result may be viewed as a `field-size/dimension' tradeoff for affine extractors. Although for large k we are not able to improve (or even match) the previous result of Bourgain, our construction and proof have the advantage of being very simple: Assume n is prime and d is odd, and fix any non-trivial linear map T:Fn |-> F. Define QR:F |-> {0,1} by QR(x)=1 if and only if x is a quadratic residue. Then, the function D:Fn |->{0,1} defined by D(x)=QR(T(xd)) is an (n,k)-affine extractor.
Our proof uses a result of Heur, Leung and Xiang giving a lower bound on the dimension of products of subspaces.
Joint Work with Matt DeVos.
Abstract: We give the first non-trivial upper bounds on the average sensitivity and noise sensitivity of degree-d polynomial threshold functions (PTFs). These bounds hold both for PTFs over the Boolean hypercube and for PTFs over R^n under the standard n-dimensional Gaussian distribution. Our bound on the Boolean average sensitivity of PTFs represents progress towards the resolution of a conjecture of Gotsman and Linial, which states that the symmetric function slicing the middle d layers of the Boolean hypercube has the highest average sensitivity of all degree-d PTFs. Via the L_1 polynomial regression algorithm of Kalai et al., our bounds on Gaussian and Boolean noise sensitivity yield polynomial-time agnostic learning algorithms for the broad class of constant-degree PTFs under these input distributions.
We also give a regularity lemma for degree-d PTFs over the Boolean cube {-1,1}^n. This result shows that every degree-d PTF can be decomposed into a constant number of subfunctions such that almost all of the subfunctions are close to being regular PTFs.
Joint work with Ilias Diakonikolas, Prasad Raghavendra, Rocco Servedio and Andrew Wan.
Abstract: Multi-party secure computations are general important procedures to compute any function while keeping the security of private inputs. In this work we ask whether preprocessing can allow low latency (that is, small round) secure multi-party protocols that are universally-composable (UC). In particular, we allow any polynomial time preprocessing as long as it is independent of the exact circuit and actual inputs of the specific instance problem to solve, with only a bound k on the number of gates in the circuits known.
To address the question, we first define the model of "Multi-Party Computation on Encrypted Data"(MP-CED), implicitly described in [FH96,JJ00,CDN01,DN03]. In this model, computing parties establish a threshold public key in a preprocessing stage, and only then private data, encrypted under the shared public key, is revealed. The computing parties then get the computational circuit they agree upon and evaluate the circuit on the encrypted data. The MP-CED model is interesting since it is well suited for modern computing environments, where many repeated computations on overlapping data are performed.
We present two different round-efficient protocols in this model:
- The first protocol generates k garbled gates in the preprocessing stage and
requires only two (online) rounds.
- The second protocol generates a garbled universal circuit of size O(k log k)
in the preprocessing stage, and requires only one (online) round (i.e., an
obvious lower bound), and therefore it can run asynchronously.
Both protocols are secure against an active, static adversary controlling any number of parties. When the fraction of parties the adversary can corrupt is less than half, the adversary cannot force the protocols to abort. The MP-CED model is closely related to the general Multi-Party Computation (MPC) model and, in fact, both can be reduced to each other. The first (resp. second) protocol above naturally gives protocols for three-round (resp. two-round) universally composable MPC secure against active, static adversary controlling any number of parties (with preprocessing).
Joint work with Ariel Elbaz, Tal Malkin, and Moti Yung.
Abstract: We give an efficient algorithm that takes as input any (probabilistic) polynomial time algorithm A which purports to solve SAT and finds, for infinitely many input lengths, SAT formulas phi and witnesses w such that A claims phi is unsatisfiable, but w is a satisfying assignment for phi (assuming NP notin RP). This solves an open problem posed in the work of Gutfreund, Shaltiel, and Ta-Shma (CCC 2005). Following Gutfreund et al., we also extend this to give an efficient sampling algorithm (a "quasi-hard sampler") which generates hard instance/witness pairs for all algorithms running in some fixed polynomial time.
We ask how our sampling algorithm relates to various cryptographic notions. We show that our sampling algorithm gives a simple construction of quasi-one-way functions, a weakened notion of standard one-way functions. We also investigate the possibility of obtaining pseudorandom generators from our quasi-one-way functions and show that a large class of reductions that work in the standard setting must fail.
Joint work with Andrej Bogdanov and Kunal Talwar.
Abstract:
The traveling salesman problem (TSP) --- the problem of finding the
shortest tour through a set of cities --- is the prototypical
combinatorial optimization problem. It comes in two flavors, the
symmetric (STSP) and asymmetric (ATSP) versions; the latter
corresponds to the setting in which the cost of traveling from A to B
might differ from the cost from B to A. Whether one can efficiently
find solutions in the asymmetric case that are within a constant
factor of optimal has proved to be elusive despite considerable
efforts, while the same question in the symmetric case has been
settled for decades.
In this talk, I will describe a randomized algorithm for ATSP which
delivers a solution within a factor $O(\log n/ \log\log n)$ of the
optimum. This is the first approximation algorithm that breaks the
logarithmic glass ceiling of approximability. The algorithm is
somewhat reminiscent of Christofides' algorithm for the symmetric
case. It first solves the classical Held-Karp linear programming
relaxation of the problem, then generates a spanning tree T while
disregarding the orientations of the arcs, and finally augments it
into a directed Eulerian graph at minimum cost by solving a minimum
cost flow problem. The cost of the final step depends on the {\it
thinness} of the tree T. We show how to construct a suitable
probability distribution on spanning trees which are sufficiently thin
with high probability.
During the talk, our tour will bring us to visit polyhedral arguments,
negative correlation, random spanning trees, maximum entropy
distributions, approximate minimum cuts, Kirchhoff's matrix tree
theorem, and several other concepts.
This is joint work with Arash Asadpour, Aleksander Madry, Shayan Oveis
Gharan and Amin Saberi.
BIO
Michel Goemans is the Leighton Family Professor of Applied Mathematics
at the Massachusetts Institute of Technology, and a faculty member of
MIT Computer Science and Artificial Intelligence Laboratory (CSAIL)
and of MIT Operations Research Center (ORC). He is currently the
Chairman of Applied Mathematics at MIT. He has held an Adjunct
Professorship at the University of Waterloo, a Professorship at the
University of Louvain and a visiting Professorship at RIMS, Kyoto.
His research --- in the areas of discrete algorithms and combinatorial
optimization --- has been rewarded by several prizes, in particular
the 2000 AMS-MPS Fulkerson Prize and twice the SIAM Optimization
prize. He is an ACM Fellow, a Guggenheim Fellow and a Sloan Foundation
Fellow.
Comments on this page are welcome; please send them to tal-at-cs.columbia.edu
Last updated 11/10/2009.