For Spring 2014, the usual time for the meetings will be Friday at 12:00 - 13:30 in the open meeting area, CSB 477. Abstracts for talks are given below the talk schedule.
Abstract: We demonstrate a simple, statistically secure, ORAM with computational overhead [;\tilde{O}(\log^2 n);]; previous ORAM protocols achieve only computational security (under computational assumptions) or require [;\tilde{\Omega}(\log^3 n);] overheard. An additional benefit of our ORAM is its conceptual simplicity, which makes it easy to implement in both software and (commercially available) hardware.
Our construction is based on recent ORAM constructions due to Shi, Chan, Stefanov, and Li (Asiacrypt 2011) and Stefanov and Shi (ArXiv 2012), but with some crucial modifications in the algorithm that simplifies the ORAM and enable our analysis. A central component in our analysis is reducing the analysis of our algorithm to a "supermarket" problem; of independent interest (and of importance to our analysis,) we provide an upper bound on the rate of "upset" customers in the "supermarket" problem.
Abstract: We extend the line of research initiated by Fortnow and Klivans \cite{FortnowKlivans09} that studies the relationship between efficient learning algorithms and circuit lower bounds. In \cite{FortnowKlivans09}, it was shown that if a Boolean circuit class C has an efficient deterministic exact learning algorithm, (i.e. an algorithm that uses membership and equivalence queries) then [;EXP^{NP} \not \subseteq ppoly[C];] -the class of all languages that can be computed by polynomial-size circuits from the class C.}. Recently, in \cite{KKO13} [;EXP^{NP};] was replaced by [;DTIME(n^{\omega(1)});]. Yet for the models of randomized exact learning or Valiant's PAC learning, the best result so far is a lower bound against BPEXP (the exponential-time analogue of BPP. In this paper, we derive stronger lower bounds as well as some other consequences from randomized exact learning and PAC learning algorithms, answering an open question posed in \cite{FortnowKlivans09} and \cite{KKO13}. In particular, we show that :
We note that in both cases the learning algorithms need not be proper. The extra bit of advice comes to accommodate the need to keep the promise of bounded away probabilities of acceptance and rejection. The exact same problem arises when trying to prove lower bounds for BPTIME or MA \cite{Barak02, FS04, Santhanam09}. It has been an open problem to remove this bit. We suggest an approach to settle this problem in our case.
Abstract: We analyze and study a hybrid model for testing and learning probability distributions, in which the testing algorithm is provided with two different types of oracles to the unknown distribution D over [n]. More precisely, we define both the Dual and Cumulative Dual access models, in which the algorithm A can both sample from D and respectively, for any i in [n],
These two models, by generalizing the previous sampling and query oracles, allow us to bypass the strong lower bounds established for a number of problems in these settings, while capturing several interesting properties of theirs -- and providing new insight on their limitations. Finally, we show that while the testing algorithms can be in most cases strictly more efficient, some tasks remain hard even with this additional power.
Joint work with Ronitt Rubinfeld (MIT).
Abstract: I will mainly talk about our recent paper on deep learning: Provable bound for learning some deep representations. Deep learning, a modern version of artificial neural net, has been a very powerful tool in recent years for many machine learning tasks such as image recognition, speech recognition and natural language processing. However, theoretically, it still remains mysterious why these modern heuristics and tricks together with millions of data result in big improvements for so many important questions. With the motivation to resolve these mysteries, this paper proposes an efficient algorithm that provably learns a class of deep neural networks under a generative model.
In more detail, the network we study has constant number of layers, each consecutive two layers are connected via a random degree-d bipartite graph with random weights in [-1,1]. The observed data at bottom is generated as follows: First a sparse hidden vector at the top layer is chosen, then the top layer activates some of the second layer nodes by multiplying the connection matrix and applying another point-wise threshold function, and then adding some small noise. Then the second layer activates the third using the same rules with the new connection matrix, so on and so forth. Our algorithms use layer-wise learning and for each layer we exploit the old Hebbian rule: "Things fire together wire together", that is, if two nodes in the same layer are correlated, then they are likely to share a neighbor in the layer above them. After observing the common neighbor information, we could apply a graph recovery procedure and get the support of the network, from which we could get the hidden variables and the weights. The running time is polynomial and the sample complexity is quadratic or cubic in the number of nodes, depending on the details of the model.
The talk will be self contained and this is the joint work with Sanjeev Arora, Aditya Bhaskara and Rong Ge.
Abstract: We study the question of closeness testing for two discrete distributions. More precisely, given samples from two distributions p and q over an n-element set, we wish to distinguish whether p = q versus p is at least epsilon-far from q, in either L1 or L2 distance. Batu et al (FOCS 2000/JACM 2013) gave the first sub-linear time algorithms for these problems, albeit with suboptimal dependence on n and epsilon.
In this talk, I will present simple (and new) testers for both the L1 and L2 settings, with sample complexity that is information-theoretically optimal, to constant factors, both in the dependence on n, and the dependence on epsilon.
The talk will be based on joint work with S. Chan (MSR), G. Valiant (Stanford) and P. Valiant (Brown).
Abstract: Given a weighted graph, a distance oracle takes as an input a pair of vertices and returns the distance between them. The labeling approach to distance oracle design is to precompute a label for every vertex so that the distances can be computed from the corresponding labels, without looking at the graph. The hub labeling (HL) algorithm received a lot of attention recently in both theoretical and experimental contexts. In particular, Cohen et al. (2003) developed an O(log n) approximation algorithm for finding the smallest labeling. Abraham at al. (2012) introduced a special kind of labeling -- hierarchical labeling (HHL) -- and showed that it is determined by an ordering of vertices. They give a practical algorithm to compute small HHLs on some classes of graphs, including road networks. Akiba et al. (2013) give a faster algorithm to compute the HHL from an order and apply it to social networks.
We introduce a variant of the Cohen at al. algorithm that is faster both in theory and in practice. We also introduce on-line tie-breaking that reduces the label size on networks with multiple shortest paths, such as social networks. We introduce an algorithm to efficiently compute good orders. This algorithm scales to large networks and is more robust than the previous ones. Finally, we introduce a label compression technique that significantly reduces the size of the labels.
Our experimental results show that the new HHL algorithm is practical on a much broader class of graphs. We also show that it usually produces labels that are not much bigger than those for the HL algorithm with the guaranteed approximation ratio.
Joint work with Daniel Delling, Thomas Pajor, Ruslan Savchenko and Renato Werneck.
Abstract: With the variety of applications of machine learning across science, engineering, and computing in the age of Big Data, re-examining the underlying foundations of the field has become imperative. In this talk, I will describe new models and algorithms for important emerging paradigms, specifically, interactive learning and distributed learning.
For active learning, where the algorithm itself can ask for labels of carefully chosen examples from a large pool of unannotated data with the goal of minimizing human labeling effort, I will present results giving computationally efficient, optimal label complexity algorithms. I will also discuss learning with more general forms of interaction, as well as unexpected implications of these results for classic supervised learning paradigms.
For distributed learning, I will discuss a model that for the first time addresses the core question of what are the fundamental communication requirements for achieving accurate learning. Broadly, we consider a framework where massive amounts of data is distributed among several locations, and our goal is to learn a low-error predictor with respect to the overall distribution of data using as little communication, and as few rounds of interaction, as possible. We provide general upper and lower bounds on the amount of communication needed to learn a given class, as well as broadly-applicable techniques for achieving communication-efficient learning.
Abstract: We prove a [;\tilde{\Omega}(n^{1/5});] lower bound on the query complexity of any non-adaptive two-sided error algorithm for testing whether an unknown n-variable Boolean function is monotone versus constant-far from monotone. This gives an exponential improvement on the previous lower bound of [;\Omega(log n);] due to Fischer et al from 2002. Our approach extends to give a similar lower bound for monotonicity testing of Boolean-valued functions over certain hypergrid domains [;\{1,2,...,m\}^n;].
Joint work with Rocco Servedio.
Abstract: In this talk, we consider some fundamental maximum throughput routing problems in directed graphs. In this setting, we are given a capacitated directed graph. We are also given source-destination pairs of nodes (s_1, t_1), (s_2, t_2), ..., (s_k, t_k). The goal is to select a largest subset of the pairs that are simultaneously routable subject to the capacities; a set of pairs is routable if there is a multicommodity flow for the pairs satisfying certain constraints that vary from problem to problem (e.g., integrality, unsplittability, edge or node capacities). Two well-studied optimization problems in this context are the Maximum Edge Disjoint Paths (MEDP) and the All-or-Nothing Flow (ANF) problem. In MEDP, a set of pairs is routable if the pairs can be connected using edge-disjoint paths. In ANF, a set of pairs is routable if there is a feasible multicommodity flow that fractionally routes one unit of flow from s_i to t_i for each routed pair (s_i, t_i).
MEDP and ANF are both NP-hard and their approximability has attracted substantial attention over the years. Over the last decade, several breakthrough results on both upper bounds and lower bounds have led to a much better understanding of these problems. At a high level, one can summarize this progress as follows. MEDP and ANF admit poly-logarithmic approximations in undirected graphs if one allows constant congestion, i.e., the routing violates the capacities by a constant factor. Moreover, these problems are hard to approximate within a poly-logarithmic factor in undirected graphs even if one allows constant congestion. In sharp contrast, both problems are hard to approximate to within a polynomial factor in directed graphs even if a constant congestion is allowed and the graph is acyclic.
In this talk, we focus on routing problems in directed graphs in the setting in which the demand pairs are symmetric: the input pairs are unordered and a pair s_i t_i is routed only if both the ordered pairs (s_i,t_i) and (t_i,s_i) are routed. Perhaps surprisingly, the symmetric setting can be much more tractable than the asymmetric setting. As we will see in this talk, when the demand pairs are symmetric, ANF admits a poly-logarithmic approximation with constant congestion. We will also touch upon some open questions related to MEDP in directed graphs with symmetric pairs.
This talk is based on joint work with Chandra Chekuri.
Abstract: The class of polynomial threshold functions (PTFs) is one of the most well-studied classes of Boolean functions in complexity theory and learning theory. The problem of deterministic approximate counting for polynomial threshold functions (over the Boolean hypercube) is an important problem in the area of unconditional derandomization and has been the subject of a lot of research in the past decade. Despite this, the problem of getting an algorithm with polynomial running time and sub-constant error has remained open. In this talk, we offer a positive resolution to this problem.
The novel ingredients in this work are:
Joint work with Rocco Servedio.
Comments on this page are welcome; please send them to xichen-at-cs.columbia.edu