## Columbia Theory Seminar, Fall 2009

For Fall 2009, the usual time for the meetings will be Fridays at 12:00-1:00pm in the CS conference room, CSB 453. Abstracts for talks are given below the talk schedule.

• Wednesday, September 16, 3pm, CS ConferenceRoom: Mariana Raykova: Efficient Robust Private Set Intersection (Abstract)

• Thursday, September 24, 2:30pm, CS ConferenceRoom: Jon Feldman: Online Stochastic Matching: Beating 1-1/e (Abstract)

• Friday, October 9, 12:00pm, CS ConferenceRoom: Ragesh Jaiswal: Streaming k-means Approximation (Abstract)

• Friday, October 23, 12:00pm, CS ConferenceRoom: Ariel Gabizon: Simple Affine Extractors using Dimension Expansion (Abstract)

• Thursday, November 5, 2:30pm, Open Meeting Area (note unusual time and place): Li-Yang Tan: Average sensitivity and noise sensitivity of polynomial threshold functions. (Abstract)

• Friday, November 13: NY Colloquum on Algorithms and Complexity, CUNY graduate center (link)

• Friday, November 20, 12:00pm, CS ConferenceRoom: Seung Geol Choi: Secure Multi-party Computation Minimizing Online Rounds (Abstract)

• Friday, December 4, Security and Privacy Day at NYU Poly (link)

• Thursday, December 10, 2:30pm (note unusual time), CS ConferenceRoom: Andrew Wan: Hard instances for satsifiability and quasi-one-way functions (Abstract)

• Friday, December 11, Theory Day at CUNY (link)

• Tuesday, December 15th, 2009, 1:15pm, 303 Mudd (note unusual time and place): Michel Goemans: Approximating the Asymmetric Traveling Salesman Problem (Abstract)

• Thursday, March 11 (time and place TBA): Steven Meyers: (TBA)

### Talk Abstracts

Wednesday, September 16:
Efficient Robust Private Set Intersection
Mariana Raykova
Columbia University

Abstract: Computing Set Intersection privately and efficiently between two mutually mistrusting parties is an important basic procedure in the area of private data mining. Assuring robustness, namely, coping with potentially arbitrarily misbehaving (i.e., malicious) parties, while retaining protocol efficiency (rather than employing costly generic techniques) is an open problem. In this work the first solution to this problem is presented.

Joint work with Dana Dachman-Soled, Tal Malkin, and Moti Yung

Thursday, September 24:
Online Stochastic Matching: Beating 1-1/e
Jon Feldman

Abstract: We study the online stochastic bipartite matching problem, in a form motivated by display ad allocation on the Internet. In the online, but adversarial case, the celebrated result of Karp, Vazirani and Vazirani gives an approximation ratio of 1 - 1/e = 0.632, a very familiar bound that holds for many online problems; further, the bound is tight in this case. In the online, stochastic case when nodes are drawn repeatedly from a known distribution, the greedy algorithm matches this approximation ratio, but still, no algorithm is known that beats the 1 - 1/e bound.

Our main result is a 0.67-approximation online algorithm for stochastic bipartite matching, breaking this 1 - 1/e barrier. Furthermore, we show that no online algorithm can produce a 1-epsilon approximation for an arbitrarily small epsilon for this problem.

Our algorithms are based on computing an optimal offline solution to the expected instance, and using this solution as a guideline in the process of online allocation. We employ a novel application of the idea of the power of two choices from load balancing: we compute two disjoint solutions to the expected instance, and use both of them in the online algorithm in a prescribed preference order. To identify these two disjoint solutions, we solve a max flow problem in a boosted flow graph, and then carefully decompose this maximum flow to two edge-disjoint (near-)matchings. In addition to guiding the online decision making, these two offline solutions are used to characterize an upper bound for the optimum in any scenario. This is done by identifying a cut whose value we can bound under the arrival distribution.

Joint work with Aranyak Mehta, Vahab Mirrokni and S. Muthukrishnan.

Friday, October 9:
Streaming k-means Approximation
Ragesh Jaiswal
Columbia University

Abstract: We extend recent results of Arthur and Vassilvitskii to provide a pseudo-approximation algorithm for the k-means clustering problem in the batch (non-streaming) setting. We then use this algorithm in an extended version of the divide-and-conquer strategy by Guha et al. to obtain a streaming algorithm for the k-means problem with good approximation/memory tradeoffs. More specifically, we first extend the k-means++ algorithm, due to Arthur and Vassilvitskii, which is a randomized algorithm that gives an O(log k) approximation to the k-means objective in the batch setting. Our new algorithm, is shown to give a constant approximation factor to the k-means objective, and outputs O(k log k) centers. We then provide a variant of Guha et al.'s algorithm for k-means in the one-pass (streaming) setting, resulting in an efficient algorithm which outputs exactly k centers and gives a tradeoff of O(n^a) memory for an O(c^(1/a) log(k)) approximation for any constant a<1 and some fixed constant c>1. Empirical evaluations on real and simulated data reveal the practical utility of our method. I will also talk about a quantitatively improved version of our first result obtained independently by Aggarwal et al. and some interesting open questions.

This is a joint work with Claire Monteleoni (CCLS, Columbia) and Nir Ailon (Google).

Friday, October 23:
Simple Affine Extractors using Dimension Expansion
Ariel Gabizon
Columbia University

Abstract: Let F be the field of q elements. An (n,k)-affine extractor is a mapping D:Fn -> {0,1} such that for any k-dimensional affine subspace X ⊆ Fn, D(x) is an almost unbiased bit when x is chosen uniformly from X.

Loosely speaking, the problem of explicitly constructing affine extractors gets harder as q gets smaller and easier as k gets larger. This is reflected in previous results: When q is large enough', specifically q=(n2), Gabizon and Raz construct affine extractors for any k ≥ 1. In the hardest case', i.e. when q=2, Bourgain constructs affine extractors for k ≥ δ n for any constant (and even slightly sub-constant) δ >0. Our main result is the following: Fix any k ≥ 2 and let d=5n/k. Then whenever q> 2d^2 and p=char(F)>d, we give an explicit (n,k)-affine extractor. For example, when k= δn for constant δ>0, we get an extractor for a field of constant size Ω( (1/δ)^2).

Thus our result may be viewed as a field-size/dimension' tradeoff for affine extractors. Although for large k we are not able to improve (or even match) the previous result of Bourgain, our construction and proof have the advantage of being very simple: Assume n is prime and d is odd, and fix any non-trivial linear map T:Fn |-> F. Define QR:F |-> {0,1} by QR(x)=1 if and only if x is a quadratic residue. Then, the function D:Fn |->{0,1} defined by D(x)=QR(T(xd)) is an (n,k)-affine extractor.

Our proof uses a result of Heur, Leung and Xiang giving a lower bound on the dimension of products of subspaces.

Joint Work with Matt DeVos.

Thursday, November 5:
Average sensitivity and noise sensitivity of polynomial threshold functions
Li-Yang Tan
Columbia University

Abstract: We give the first non-trivial upper bounds on the average sensitivity and noise sensitivity of degree-d polynomial threshold functions (PTFs). These bounds hold both for PTFs over the Boolean hypercube and for PTFs over R^n under the standard n-dimensional Gaussian distribution. Our bound on the Boolean average sensitivity of PTFs represents progress towards the resolution of a conjecture of Gotsman and Linial, which states that the symmetric function slicing the middle d layers of the Boolean hypercube has the highest average sensitivity of all degree-d PTFs. Via the L_1 polynomial regression algorithm of Kalai et al., our bounds on Gaussian and Boolean noise sensitivity yield polynomial-time agnostic learning algorithms for the broad class of constant-degree PTFs under these input distributions.

We also give a regularity lemma for degree-d PTFs over the Boolean cube {-1,1}^n. This result shows that every degree-d PTF can be decomposed into a constant number of subfunctions such that almost all of the subfunctions are close to being regular PTFs.

Joint work with Ilias Diakonikolas, Prasad Raghavendra, Rocco Servedio and Andrew Wan.

Friday, November 20:
Secure Multi-party Computation Minimizing Online Rounds
Seung Geol Choi
Columbia University

Abstract: Multi-party secure computations are general important procedures to compute any function while keeping the security of private inputs. In this work we ask whether preprocessing can allow low latency (that is, small round) secure multi-party protocols that are universally-composable (UC). In particular, we allow any polynomial time preprocessing as long as it is independent of the exact circuit and actual inputs of the specific instance problem to solve, with only a bound k on the number of gates in the circuits known.

To address the question, we first define the model of "Multi-Party Computation on Encrypted Data"(MP-CED), implicitly described in [FH96,JJ00,CDN01,DN03]. In this model, computing parties establish a threshold public key in a preprocessing stage, and only then private data, encrypted under the shared public key, is revealed. The computing parties then get the computational circuit they agree upon and evaluate the circuit on the encrypted data. The MP-CED model is interesting since it is well suited for modern computing environments, where many repeated computations on overlapping data are performed.

We present two different round-efficient protocols in this model:
- The first protocol generates k garbled gates in the preprocessing stage and requires only two (online) rounds.
- The second protocol generates a garbled universal circuit of size O(k log k) in the preprocessing stage, and requires only one (online) round (i.e., an obvious lower bound), and therefore it can run asynchronously.

Both protocols are secure against an active, static adversary controlling any number of parties. When the fraction of parties the adversary can corrupt is less than half, the adversary cannot force the protocols to abort. The MP-CED model is closely related to the general Multi-Party Computation (MPC) model and, in fact, both can be reduced to each other. The first (resp. second) protocol above naturally gives protocols for three-round (resp. two-round) universally composable MPC secure against active, static adversary controlling any number of parties (with preprocessing).

Joint work with Ariel Elbaz, Tal Malkin, and Moti Yung.

Thursday, December 10:
Hard instances for satsifiability and quasi-one-way functions
Andrew Wan
Columbia University

Abstract: We give an efficient algorithm that takes as input any (probabilistic) polynomial time algorithm A which purports to solve SAT and finds, for infinitely many input lengths, SAT formulas phi and witnesses w such that A claims phi is unsatisfiable, but w is a satisfying assignment for phi (assuming NP notin RP). This solves an open problem posed in the work of Gutfreund, Shaltiel, and Ta-Shma (CCC 2005). Following Gutfreund et al., we also extend this to give an efficient sampling algorithm (a "quasi-hard sampler") which generates hard instance/witness pairs for all algorithms running in some fixed polynomial time.

We ask how our sampling algorithm relates to various cryptographic notions. We show that our sampling algorithm gives a simple construction of quasi-one-way functions, a weakened notion of standard one-way functions. We also investigate the possibility of obtaining pseudorandom generators from our quasi-one-way functions and show that a large class of reductions that work in the standard setting must fail.

Joint work with Andrej Bogdanov and Kunal Talwar.

Tuesday, December 15:
Approximating the Asymmetric Traveling Salesman Problem
Michel Goemans
MIT

Abstract: The traveling salesman problem (TSP) --- the problem of finding the shortest tour through a set of cities --- is the prototypical combinatorial optimization problem. It comes in two flavors, the symmetric (STSP) and asymmetric (ATSP) versions; the latter corresponds to the setting in which the cost of traveling from A to B might differ from the cost from B to A. Whether one can efficiently find solutions in the asymmetric case that are within a constant factor of optimal has proved to be elusive despite considerable efforts, while the same question in the symmetric case has been settled for decades.

In this talk, I will describe a randomized algorithm for ATSP which delivers a solution within a factor $O(\log n/ \log\log n)$ of the optimum. This is the first approximation algorithm that breaks the logarithmic glass ceiling of approximability. The algorithm is somewhat reminiscent of Christofides' algorithm for the symmetric case. It first solves the classical Held-Karp linear programming relaxation of the problem, then generates a spanning tree T while disregarding the orientations of the arcs, and finally augments it into a directed Eulerian graph at minimum cost by solving a minimum cost flow problem. The cost of the final step depends on the {\it thinness} of the tree T. We show how to construct a suitable probability distribution on spanning trees which are sufficiently thin with high probability.

During the talk, our tour will bring us to visit polyhedral arguments, negative correlation, random spanning trees, maximum entropy distributions, approximate minimum cuts, Kirchhoff's matrix tree theorem, and several other concepts.

This is joint work with Arash Asadpour, Aleksander Madry, Shayan Oveis Gharan and Amin Saberi.

BIO
Michel Goemans is the Leighton Family Professor of Applied Mathematics at the Massachusetts Institute of Technology, and a faculty member of MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) and of MIT Operations Research Center (ORC). He is currently the Chairman of Applied Mathematics at MIT. He has held an Adjunct Professorship at the University of Waterloo, a Professorship at the University of Louvain and a visiting Professorship at RIMS, Kyoto.

His research --- in the areas of discrete algorithms and combinatorial optimization --- has been rewarded by several prizes, in particular the 2000 AMS-MPS Fulkerson Prize and twice the SIAM Optimization prize. He is an ACM Fellow, a Guggenheim Fellow and a Sloan Foundation Fellow.

Contact tal-at-cs.columbia.edu if you want to volunteer to give a talk (especially for students!). The talk can be about your or others' work. It can be anything from a polished presentation of a completed result, to an informal black-board presentation of an interesting topic where you are stuck on some open problem. It should be accessible to a general theory audience. I will be happy to help students choose papers to talk about.
There is a mailing list for the reading group. General information about the mailing list (including how to subscribe yourself to it) is available here. If you want to unsubscribe or change your options, send email to theoryread-request@lists.cs.columbia.edu with the word help' in the subject or body (don't include the quotes), and you will get back a message with instructions.