This paper considers the following setting: A data analysis algorithm is given access to noisy labeled data points (x,w.x + noise) where x is a high-dimensional vector, w.x is some linear function of x, and the true labels w.x are corrupted by noise. The goal is to determine whether the coefficient vector w=(w1,…,wn) is sparse (i.e. has almost all of its entries being zero) or is far from sparse (has many nonzero entries).
A long line of work in a field known as “compressed sensing” has addressed a related version of this problem in which the goal is to (approximately) discover the vector w assuming that it is sparse. It is known that algorithms for this “reconstruction” problem require a number of data points which grows with n. This work considers a potentially easier “decision” question, of just deciding whether or not the unknown vector w actually is sparse as described above…but is this problem actually easier, or not?
The main result of this paper gives a sharp characterization of the conditions under which the decision problem actually is easier than the reconstruction problem. It shows that for many distributions of data points, it is possible to solve the decision problem with no dependence on the dimension n, and explains why in the other cases a dependence on n is necessary.
“It was a bit of a surprise that we were able to come up with a sharp characterization of the problem variants for which there needs to be a dependence on the distribution,” said Rocco Servedio, a computer science professor who worked on the paper. “This characterization relied on some results from probability theory that were proved 80 years ago – we were lucky that the right tools for the job had already been prepared for us in the math literature.”
This paper is aimed at understanding the extent to which randomness can help computers solve problems more efficiently. A long line of research in this area aims at coming up with efficient deterministic (non-random) algorithms for various problems that are known to have efficient randomized algorithms; this research program is sometimes known as “derandomization,” as its goal is to show that randomness is not really necessary as a fundamental computational resource.
The most commonly pursued avenue of work along these lines is to design certain particular kinds of deterministic algorithms known as “pseudorandom generators”. Roughly speaking, a pseudorandom generator for a certain type of Boolean (0/1-valued) function gives an efficient deterministic way of estimating the fraction of inputs that make the function output 1 as opposed to 0. A natural way to estimate this fraction is just to do random sampling, but the key challenge is to do it without using randomness — i.e. to “derandomize” the natural random-sampling-based approach.
This particular paper develops new pseudorandom generators for a class of geometrically defined functions called “polynomial threshold functions”. A polynomial threshold function is defined by a multivariable polynomial, like p(x,y,z) = 3x^2 – 6 x^2 y^2 z^3 + 4xyz; the most important parameter of such a polynomial is its degree. (The degree of the example polynomial p(x,y,z) given above is 7, because of the x^2 y^2 z^3 term.) The higher the degree of a polynomial, the more complicated it is. The main result in this paper is a pseudorandom generator for polynomial threshold functions which can handle polynomials of exponentially higher degree than could be achieved in previous work.
The researchers designed a new parallel approximate undirected shortest path algorithm which has polylog(n) depth and near linear total work. This improved a long-standing upper bound obtained by Edith Cohen (STOC’94).
To further illustrate the efficiency of the algorithm, consider the case when the algorithm is run on a single processor, the running time is almost the same as the best known sequential shortest path algorithm, Dijkstra’s algorithm. If the algorithm is run on m processors, where m is the size of the graph, the algorithm only takes a polylog(n) number of rounds — this is also near optimal (many simple problems such as computing XOR of n bits need at least log(n) rounds).
The paper develops a new concept called low hop emulator to solve the parallel approximate shortest paths problem. It shows that for any graph G and any integer k>=1, there exists another graph G’ with size n^(1+2/k) such that 1. Any shortest path in G’ only uses log(k) number of edges, 2, the distance between each pair of vertices in G’ is a poly(k) approximation to their original distance in G.
“This result is surprising and I believe this is an important discovery in graph theory,” said Peilin Zhong, a fourth-year PhD student. “We also show that G’ can be constructed efficiently.”
This work provides insights into the inner-workings of some of the most widely used local search methods.
Local search is one of the most prominent algorithm design paradigms for combinatorial optimization problems. A local search algorithm begins with an initial solution and then follows a path by iteratively moving to a better neighboring solution until a local optimum is reached. However, a recurring phenomenon in local search algorithms is that they are usually efficient in practice but the theory of worst-case analysis indicates the opposite — due to some delicate pathological instances that one may never encounter in practice, the algorithm would be characterized as an exponential running time method.
In order to bridge theory and practice, a smoothed complexity framework has been introduced, a hybrid complexity model between the pessimistic worst-case analysis and the optimistic average-case analysis.
This work analyzes the MaxCut problem, one of the famous 21-Karp NP-complete (”hard”) problems. Given a weighted undirected graph, the MaxCut problem seeks to partition the vertices of the graph into two sets such that the sum of the weights of the edges that join the two sets is maximized. Its applications come from diverse areas including the layout of electronic circuitry, social networks, and statistical physics.
The researchers improved the smoothed complexity of the local search methods for the Max-Cut problem. The result is based on an analysis of long sequences of local changes (flips), which shows that it is very unlikely for every flip in a long sequence to incur a positive but small improvement in the cut weight. They also extend the same upper bound on the smoothed complexity of FLIP to all binary Maximum Constraint Satisfaction Problems.
The Columbia Engineering community has come together to combat the coronavirus pandemic on multiple fronts. In close collabo-ration with the Columbia University Irving Medical Center, we’re leveraging our expertise and innovation to address short term medical needs and long term societal impacts.
Dean Boyce's statement on amicus brief filed by President Bollinger
President Bollinger announced that Columbia University along with many other academic institutions (sixteen, including all Ivy League universities) filed an amicus brief in the U.S. District Court for the Eastern District of New York challenging the Executive Order regarding immigrants from seven designated countries and refugees. Among other things, the brief asserts that “safety and security concerns can be addressed in a manner that is consistent with the values America has always stood for, including the free flow of ideas and people across borders and the welcoming of immigrants to our universities.”
This recent action provides a moment for us to collectively reflect on our community within Columbia Engineering and the importance of our commitment to maintaining an open and welcoming community for all students, faculty, researchers and administrative staff. As a School of Engineering and Applied Science, we are fortunate to attract students and faculty from diverse backgrounds, from across the country, and from around the world. It is a great benefit to be able to gather engineers and scientists of so many different perspectives and talents – all with a commitment to learning, a focus on pushing the frontiers of knowledge and discovery, and with a passion for translating our work to impact humanity.
I am proud of our community, and wish to take this opportunity to reinforce our collective commitment to maintaining an open and collegial environment. We are fortunate to have the privilege to learn from one another, and to study, work, and live together in such a dynamic and vibrant place as Columbia.
Mary C. Boyce
Dean of Engineering
Morris A. and Alma Schapiro Professor