No, We Can’t Say Whether Cancer Is Mostly Bad Luck

Undergrads Millie Yang and Spencer Yen earn engineering fellowships from KPCB

Two Columbia computer science undergrads, Millie Yang and Spencer Yen, have been selected to participate in the Kleiner Perkins Caufield & Byers (KPCB) Fellows Program, a summer program that gives US college students the chance to intern inside Silicon Valley tech companies while being mentored by executives within those companies. The program also entails KPCB-arranged events that include talks by leading individuals from the tech industry and opportunities to meet and network with other fellows and KPCB partners.

Yang and Yen, among 54 students selected from 2000 applicants, will intern at DoorDash, a restaurant delivery service recently in the news for testing delivery robots.

Yang is a junior currently pursuing the computer science Applications Track. She came late to computer science, taking her first class in the subject her first year at Columbia. “I had thought that I would be disadvantaged, but the computer science community at Columbia welcomed me with open arms, and strongly encouraged me to try out computer science classes.” For her part, she embraces technology, and sees it as a way to empower underprivileged communities; it’s a lesson learned first hand after spending a summer volunteering in Peru. “I was astonished to find WiFi at a broadband greater than I had previously used. Watching the local Peruvians use their phones and second-hand tablets to check market prices and communicate in businesses, I realized technology is making a revolutionary impact and leveling the playing field in developing countries.”

For her, DoorDash stood out because it is a start-up facing rapid growth and is striving to revolutionize the delivery industry. The company’s mission statement of “streamlining the world’s cities” echoes her personal experiences. “Born in New York, and raised in Hong Kong and London, I embrace my multicultural background as a crucial part of my identity. My interviewers and recruiters at DoorDash have demonstrated their strong support of the company’s goals, and it made me feel very much at home.”

Yen is the rare first-year student to be selected as a KPCB fellow. Coming from the Silicon Valley (where he was a user of DoorDash), he is imbued with the self-starter ethos endemic to the area. He taught himself—with an assist from YouTube—how to code, and since early high school has been creating iPhone games and apps. “I love that I can come up with an idea and just make it, and seeing people enjoy using something I created makes it all the more rewarding.” Gaining confidence over time in his coding skills, he took the initiative, reaching out to companies in the area for summer work. Two summers ago, he finagled—via cold-emailing—an opportunity to work alongside Mark Pincus at Pincus’s Superlabs incubator where the other hires were all recent grads from MIT. “I learned a lot about how some of the best entrepreneurs and engineers in the industry operate. It was a great experience to see how they thought about design and product, and how they approached writing code.”

While he hesitates to call himself an entrepreneur—he is still in his first year at college after all—the early signs are all there, and he cops to “working on projects.” A startup last year made it into the Almaworks Startup Accelerator program before he and the others involved decided to go in another direction. While not actively looking to start a company, Yen is of course open to the possibility if the right idea comes along.

Neither Yang nor Yen lacked options for this summer, but both were looking for something more than just a job and work experience. The KPCB program came highly recommended; a former fellow from Columbia described the program to Yen as phenomenal. “It’s all about the other fellows in the program and the incredible people you meet through the program,” says Yen. “You learn a lot just by being around ambitious and talented people.”

Yang, who is studying economics in addition to computer science, would agree with that assessment. “KPCB’s Fellowship Program will allow me to be exposed to the venture capital network and educate myself on how startups succeed in the Silicon Valley.”

– Linda Crane
Posted 3/27/2017

Startup cofounded by undergrad Mathew Pregasen renders math script from plain English

His startup Parsegon also has an education focus, explains Pregasen in this Q&A by The Columbia Lion, helping students learn the language of math while enabling interactive math in the classroom and online.

Universities dive into VR research

Computer Science Department establishes Jonathan Gross Prizes to recognize top CS graduates

Beginning this spring, five graduating computer science students—one senior from each of the four undergraduate schools and one student from the master’s program—will be selected to receive the Jonathan Gross Prize. The prize honors students who graduate at the top of their class with the highest overall GPA and a track record of promising innovative contributions to computer science. Students will be recommended for the prize by the department chair, with final recipients selected by a majority vote of the faculty.

The prize is being made possible by an endowment set up by Yechiam (Y.Y.) Yemini, who for 31 years until his retirement in 2011 was a professor of computer science at Columbia. “This university has amazing students, and they deserve to have their academic accomplishments recognized by faculty and their fellow students; in recognizing them, we inspire a culture of excellence.”

The prize recognizes students while honoring Jonathan L. Gross for his 47 years teaching at Columbia, first in the Mathematical Statistics Department and then as a cofounder of the Computer Science Department.

“Jonathan Gross built and led the CS department education program for several decades,” says Yemini. “His continuing contributions to developing CS education over several decades defined his role as the ‘father’ of CS education at Columbia.”

Though primarily a mathematician, Gross had an early interest in computers and believed computing was for everybody. He was Acting Chair of Mathematical Statistics when SEAS committed funds in 1978-1979 to found the Computer Science Department. Gross’s role in starting the new department was fundamental; among his first initiatives, Gross merged the computer science courses that were concurrently being offered by both the Mathematical Statistics and the Electrical Engineering departments. As the new Computer Science Department grew from just a few professors in its first year—it now numbers almost 50 professors and five teaching-oriented faculty—Gross organized department-wide efforts to keep the academic curriculum at the educational forefront.

“I am deeply appreciative that my dear friend YY has honored me by associating my name with this wonderful award he has endowed,” says Gross, now professor emeritus. “It is a wonderful way to recognize the academic excellence of our students.”

For his part, Yemini says he feels fortunate to have the opportunity to help ensure that the legacy of excellence begun by Gross continues. Yemini’s Columbia research lab pioneered a range of seminal network technologies, from autonomic self-managing networks, to microeconomic network control, to high-speed switches and (what became two decades later) software-defined networks. The coauthor of over 100 refereed publications and a holder of 30 patents, Yemini is also a serial entrepreneur who cofounded five high-tech startups. Three of those startups grew out of technologies developed in his Columbia Lab and employ Columbia graduates. His most recent startup, Turbonomic.com (2009), just celebrated its 24^th continuous quarter of record revenues growth, raising $50M investment to value the company at over $800M. Turbonomic’s R&D team includes scores of Columbia graduates in its NYC and Westchester offices.

Now professor emeritus and officially retired, Yemini continues to develop Turbonomic technologies while also teaching the popular Columbia Course “Principles of Innovation and Entrepreneurship (PIE)” and supporting philanthropic projects.

Julia Hirschberg, chair of the Computer Science Department, will announce and distribute the first Jonathan Gross Prizes this May at the graduation reception for computer science students. “This is a wonderful gift by a great emeritus faculty, YY Yemini, to honor another great emeritus, Jonathan Gross. The Department could not be more pleased.”

– Linda Crane
Posted 3/20/2017

Lights! Camera! Algorithms? Columbia Magazine features research of Eitan Grinspun

In creating Moana and other animated movies, artists at Walt Disney Animation Studios use techniques developed by Grinspun and his students to give characters lifelike hair that flows, bounces, and twists.

Awesome people: Kathy McKeown profiled in natural language processing blog

“Whenever a new topic becomes popular in NLP, we find out that Kathy worked on it ten years ago.”

What if Quantum Computers Used Hard Drives Made of DNA?

5 ways deep learning improves your daily life

What the CIA WikiLeaks dump tells us: Encryption works

IBM inches toward human-like accuracy for speech recognition

WikiLeaks’s CIA hacking trove doesn’t live up to the hype

WikiLeaks’s statement that CIA can bypass encryption in apps is misleading, says Steven Bellovin

Encryption is strong in WhatsApp, Signal, and other apps, Bellovin explains in a blog post, forcing the CIA to hack the endpoints (phones) and collect data before it is encrypted.

Yaniv Erlich and Dina Zielinski answer questions in Reddit AMA

Their paper, published last week in Science, describes a DNA storage method capable of encoding 215 petabytes in a single gram of DNA. It is believed to be the highest-density data-storage device ever created.

Minnesota government computer systems: Old, at risk and pricey to fix

Omri Weinstein on information, communication, and computation

What is the minimum amount of resources—time, space, energy, layers in a neural net—required to solve a given computational problem? This fundamental question is the essence of complexity theory and the main driving force behind the research of Omri Weinstein, a theoretical computer scientist who joins Columbia’s Computer Science Department this semester as assistant professor.

While practitioners might be content to use empirical trial and error in the search for increasingly improved heuristics, Weinstein and other theorists aim to answer the above question in a rigorous mathematical manner. Inherent to this research endeavor is proving lower bounds, that is, showing that no algorithm—out of the infinitely many that exist for a given problem—can solve it with fewer than X resources. In this way, theorists begin to unravel the limitations of what can and cannot be done with respect to a given problem. Reasoning simultaneously about all possible algorithms for a problem requires developing “universal” tools and techniques that are independent of any particular algorithm; otherwise researchers are left with the hopeless task of ruling out each individual algorithm of the uncountable number that exists.

The new toolbox Weinstein is developing comes from the field of communication
and information theory. “Communication over a channel is one of the few models of computation that we actually understand and for which we have simple and powerful analytical tools,” says Weinstein. “Computation on the other hand, is a beast which we are still very far from understanding. It is conceivable that there is some unimaginably clever algorithm for solving the Traveling Salesman problem that we are not yet aware of, since we have few tools to reason about processing of information. However, we can capture some of the hardness of the problem by splitting the input between two (or more) parties and asking them to solve the resulting distributed problem over a communication channel. At this point, we can use tools from information theory, combinatorics, and algebra to reason about the problem, bringing it to our home court.”

The machinery Weinstein cites is mostly drawn from information theory, which was established by Claude Shannon some sixty years ago in the context of the data compression problem. In the simplest setup, Alice wishes to transmit to Bob a random message, say a huge database of high-resolution pictures, using minimum communication bits. However, most real-life (as well as theoretical) distributed scenarios are inherently interactive, with processors exchanging small chunks of data over multiple rounds (one natural example is adaptive trading in economic markets such as stock exchanges). Classic information theory does not readily lend itself to the interactive setup; in particular, compressing conversations turns out to be notoriously challenging. Much of Weinstein’s work can be viewed as an interactive extension of Shannon’s classic information theory (aka “information complexity”), where the goal is to understand how information behaves in interactive scenarios, and to develop new information-theoretic methods in communication complexity, tailored for computer science applications: “Many computational models, such as data streaming, circuit design, and economic markets, have no apparent relation to communication problems,” says Weinstein. “Using clever reductions, though, it turns out that many of these models have some intrinsic communication problem implicitly ‘embedded’ within them.”

A prototypical example of this implicit connection are data structures, which enable efficient storage and retrieval of queries in a database. Data structures may not seem to have anything to do with communication, but can in fact be viewed, in some sense, as a communication protocol between the memory (which stores the database) and the processor (which holds a query). Indeed, consider the distributed communication problem in which Alice holds the database and Bob holds a query, and the goal of the players is to answer Bob’s query with respect to Alice’s database. Clearly, neither party can solve the problem on its own; hence communication must take place. Weinstein illustrates a surprisingly simple and beautiful connection to communication complexity: “If there was a too-good-to-be-true data structure that uses too little memory and has fast access time when retrieving an answer to a query, Alice and Bob could use this hypothetical data structure to construct a very efficient communication protocol for the aforementioned distributed problem, by jointly simulating the data structure: in each communication round, Bob asks Alice for the next memory-address his query prescribes, and Alice responds with the content of this memory cell in the data structure. If the data structure has ‘fast’ retrieval time, this would imply a low-communication protocol; hence time-space lower bounds for data-structures can be obtained via lower bounds for the underlying communication problem. Again, this is typically much easier, since the communication model has much more structure.”

Information complexity has gained popularity in recent years and led to some breakthrough results on several long-standing computational problems, some by Weinstein and his colleagues, some by other scientists. His paper Direct Products in Communication Complexity, which investigated the fundamental problem of “whether solving k copies of a single problem is k times harder than solving a single copy,” led to a breakthrough in understanding the limits of parallel computing, showing that solving multiple copies of any two-party distributed problem cannot be remedied much by parallelism (i.e., parallelizing cannot reduce communication significantly). Says Weinstein, “It is hard for me to imagine a proof that does not use information theory here, as this language is much more expressive than previous analytical tools and captures the exact properties needed to solve the problem; to me this paper provides further evidence that information theory is the ‘right’ language to study communication problems.”

A tangent line of research in Weinstein’s work is studying computational aspects of economics. A 2015 paper looked to find the minimum running time required to find an approximate Nash equilibrium in two-player games, one of the most important problems in algorithmic game theory. His papers on this subject played a role in the recent breakthrough results, showing that finding Nash equilibria is some games requires a huge amount of time and communication, thereby undermining the consensus over the basic definition of equilibrium as a practical economic solution concept.

While much of his work is theoretical and abstract, Weinstein is also looking into applied problems, especially in an era where rapid evolution in technology is constantly posing new challenges that require new techniques. With Columbia colleagues, Weinstein is studying an asymmetric compression problem that naturally arises from the Internet of Things (IoT) model. Here, millions of weak tags—low resource, low memory, low power devices—are required to convey messages to a central (powerful) gateway in an extremely efficient manner, both in terms of bandwidth and processing power. Compression obviously plays a key role in this model. Alas, standard compression schemes rely on the assumption that both the encoder and the decoder know the underlying prior distribution of the message to be sent, an assumption that is not realistic on the tag side (as they do not have the space nor the energy to store expensive statistics). Says Weinstein, “Can we nevertheless exploit the fact that the gateway knows the prior distribution, even if the tags don’t?” Perhaps surprisingly, the answer is “yes.” Weinstein and his co-authors developed a low-communication encoding scheme, which further admits efficient decoding time of the message on the gateway side. Once again, information theory and communication complexity are lurking beneath this fascinating problem.

“Omri’s research strengths and interests, especially his expertise in the nascent field of information complexity, complement and enhance the scope of our existing efforts in theoretical computer science. We are thrilled he has decided to join the Computer Science Department at Columbia, and we welcome him and look forward to working together!” said Rocco Servedio, Professor of Computer Science and a member of the department’s theory group.

Weinstein received his PhD from Princeton University in 2015 under the supervision of Mark Braverman, and holds a BSc in mathematics and computer science from Tel-Aviv University. He was named a 2016 Simons Society Junior Fellow and a Siebel Scholar in 2015; in 2013, he was awarded a Simons Graduate Fellowship.

Posted 3/7/2017

Study describing efficient method for storing information in DNA garners wide press coverage

In a new study in Science, Yaniv Erlich and coauthor Dina Zielinski describe a DNA storage technique capable of encoding 215 petabytes in a single gram of DNA. Robust, scalable, and resistant to errors, the method is 60 percent more efficient than previous DNA storage strategies and approaches 90 percent of the theoretical maximum amount of information per nucleotide. News of the method, which has potential for solving the world’s data storage requirements, was widely covered in prominent news outlets.

This Speck of DNA Contains a Movie, a Computer Virus, and an Amazon Gift Card

DNA could store all of the world’s data in one room

Researchers store computer operating system and short movie on DNA

Suduko Hints at New Encoding Strategy for DNA Data Storage

Pushing the Theoretical Limits of DNA Data Storage

To make better computers, researchers look to microbiology

(Subscription) What’s Stored in DNA? An Old French Movie and a $50 Gift Card

Q&A: Encoding Classic Films, Computer Operating Systems in DNA

Hard drives of the future could be made of DNA

Study coauthored by Yaniv Erlich describes maximizing data-storage capacity of DNA molecules

The approach—scalable, highly robust, and resistant to errors—is 60% more efficient than previous DNA storage strategies and approaches 90% of the theoretical maximum amount of information per nucleotide.

EyeStyle, a PhD side project, is accepted into two accelerator programs

EyeStyle: Pick an image, select the item to search, and browse similar items.

EyeStyle—a visual platform for connecting shoppers with retailers—was recently accepted into both the NYC Media Lab Combine program and the Verizon Connected Futures Research and Prototyping Challenge. The two accelerator programs provide funding and mentoring to New York City university teams for help in developing, prototyping, and commercializing media-focused technologies. EyeStyle, which will receive $25K from the Combine and $15K from the Verizon Challenge, was the only Columbia team to advance to the final Combine round and is among three teams from Columbia to win the Verizon Challenge.

Formed by two computer vision researchers, Jie Feng and Svebor Karaman, EyeStyle employs computer vision and machine learning technology to analyze images and automatically identify and extract visual characteristics—color, shape, texture, pattern—from objects within the image. Among the possible applications are image search and automatic metadata generation.

An immediate commercial niche is online clothes shopping. Says Feng, “It’s very natural to get inspiration from seeing what other people are wearing. But finding an item of clothing is not easy. A Google search requires text, which works well when you know precisely what you’re looking for or the product is uniquely specified in some way. But using text to describe clothing, which is about style and visual impression, is hard. It’s much simpler take a photo or click on an image, and get back a list of similar products. Fashion by nature is communicated visually, why not discover it in the way it’s meant to be.”

With online shopping a major Internet activity, the demand for image search is growing (“LIKEtoKNOW.it” currently has 2M followers on Instagram). Feng and Karaman—one a computer science PhD student advised by Shih-Fu Chang and the other a postdoc in Columbia’s Digital Video | Multimedia Lab—are looking to technology to both streamline the image search workflow (having EyeStyle automatically select the search category, for instance) and to improve search capabilities, perhaps by using 3D models to more realistically represent how clothing is worn or to better train the models by taking advantage of the huge corpus of user-generated images on social media.

Cross-category searches by EyeStyle allow for discovering items inspired by art, nature, and other sources.

Image search, for which Feng and Karaman have created an app, is on the consumer side. On the retail or B2B end, Feng and Karaman are investigating how EyeStyle can best help retailers and media companies, both to draw customers to their stores or sites and to better organize their data and inventories.

Says Feng, “The combination of computer vision and machine learning will help shape the next generation of retail technology. With the funding and mentoring provided through the Combine and the Verizon Challenge, we’ll be able to explore and validate different business-use cases. We feel very excited.”

– Linda Crane
Posted 3/2/17

Month: March 2017