Lydia Chilton on computational design: Combining human creativity with computation

For the headline Liquid Water Found on Mars, which response is the least funny? Hint: One is professionally done, and two are crowdsourced. Voting results at end.

Creativity and computation are often thought to be incompatible: one open-ended and requiring imagination, originality of thought, and perhaps even a little magic; the other logical, linear, and broken down into concrete steps. But solving the hard problems of today in medicine, environmental science, biology, and software engineering will require both.

Lydia Chilton, Assistant Professor

For Lydia Chilton, who joined the Computer Science department this fall, inventing new solutions is fundamentally about design. “When people start solving a problem, they often brainstorm over a broad range of possibilities,” says Chilton, whose research focuses on human-computer interaction and crowdsourcing. “Then there is a mysterious process by which those ideas are transformed into a solution. How can we break down this process into computational steps so that we can repeat them for new problems?” This question motivates her research into tools that are part human intelligence, part computer intelligence.

How this works in practice is illustrated by a pipeline she built to automatically generate visual metaphors, where two objects, similar in shape but having different conceptual associations, are blended to create something entirely new. It’s a juxtaposition of images and concepts intended to communicate a new idea, doing so in a way that grabs people’s attention by upending expectations.

A pipeline for creating visual metaphors by synthesizing two objects and two concepts.

Chilton decomposes the process of creating visual metaphors into a series of microtasks where people and machines collaborate by working on those microtasks they are good at. Defining the microtasks and the pipeline to make them flow together coherently is the major intellectual piece.

“The key is to identify the pieces you will need, and what relationships the pieces need to have to fit together. After you define it that way, it becomes a search problem.” Because it’s a search problem over conceptual spaces computers don’t fully understand, Chilton has people fill in the gaps and direct the search. People might examine the space of objects representing Starbucks and the space representing Summer, picking the most simple, meaningful, and iconic. The computer then searches for pairs of similarly shaped objects (as annotated by people), blending them together into an initial mockup of the visual metaphor. Humans come in at the last step to tweak the blend to be visually pleasing. At every stage in the pipeline, humans and computers work together based on their different strengths.

Crowdsourcing serves another purpose in Chilton’s research: harnessing many people’s intuitions. Foreshadowing her work on pipelines, Chilton created crowd algorithms that, more than simply aggregating uninformed opinions or intuitions, aggregate intuitions in intelligent ways to lead to correct solutions.

For example, deciphering this intentionally illegible handwriting would not be possible for any single person, but a crowd algorithm enables people to work towards a solution iteratively. People in the crowd suggest partial solutions, and then others, also in the crowd, vote on which partial solution seems like the right one to continue iterating. Those in later stages benefit from seeing contextual clues and thus build on the current solution, even if they wouldn’t have had those insights without seeing others’ partial solutions. “It’s an iterative algorithm that keeps improving on the partial solutions in every iteration until the problem is solved,” says Chilton.

Out of these scribbles, someone makes out the verb “misspelled,” providing context for others to build on. Who cares about misspellings? Maybe a teacher correcting a student; now words like “grammar” become more likely. Identifying a verb means the preceding word is likely a noun, making it easier for someone else to make out “you”. Each person starts with more information and sees something different, and a task impossible for a single person becomes 95% solved. [Iteration 6: You (misspelled) (several) (words). Please spellcheck your work next time. I also notice a few grammatical mistakes. Overall your writing style is a bit too phony. You do make some good (points), but they got lost amidst the (writing). (Signature)]

Allowing people to collaborate in productive ways is the power of crowd algorithms and interactive pipelines. Her research into crowdsourcing and computational design has already earned her a Facebook Fellowship and a Brown Institute Grant. This year, she was named to the inaugural class of the ACM Future of Computing Academy.

At Columbia, she will continue applying interactive pipelines and computational design to new domains: authoring compelling arguments for ideas, finding ways to integrate existing knowledge of health and nutrition into people’s existing habits and routines, and creating humor, a known, very hard problem for computers because of the large amount of implicit communication and emotional impact.

“Although humor is valuable as a source of entertainment and clever observations about the world, humor is also a great problem to study because it is a microcosm of the fundamental process of creating novel and useful things. If we can figure out the mechanics of humor, and orchestrate it in an interactive pipeline, we would be even further towards the grand vision of computational design that could be applied to any domain.”

Humor is also a realm where human intelligence is still necessary. Computers lack the contextual clues and real world knowledge that enable people to know intuitively that a joke insulting McDonald’s or Justin Bieber is funny but one that insults refugees or clean air is not. As she did for visual metaphors, Chilton breaks down the humor creation process into microtasks that are distributed to humans and machines. This pipeline, HumorTools, was created to compete with The Onion. It generated two of the responses to the liquid water headline. The Onion writers wrote the third.

“I pick creative problems that involve images (like visual metaphors) and text (like humor) because I think both are fundamental to the human problem-solving ability,” says Chilton. “Sometimes a picture says 1000 words, and sometimes words lay out logic in ways that might be deceiving in images. The department here is strong in graphics and in speech and language processing, and I look forward to collaborating with both groups to build tools that enhance people’s problem-solving abilities.”

One of the people she will collaborate with is Steven Feiner, who directs the Computer Graphics and User Interfaces Lab. “It’s important to extend people’s capabilities, augmenting them through computation expressed as visualization,” says Feiner. “Here, the hybrid approaches between humans and computers that Lydia is exploring are especially important because these are difficult problems that we do not yet know how to do algorithmically.”

Chilton’s first class, to be taught this spring, will be User Interface Design (W4170).

Voting results for headline Liquid Water Found on Mars.

Posted 10/17/2017
– Linda Crane

Yaniv Erlich at World’s Top 50 Innovators from the Industries of the Future

To end his talk “Can we store all of world’s data on a pickup truck,” Erlich makes bold prediction that DNA storage could be cheaper than magnetic storage within a decade.

Alfred Aho honored by NEC C&C Foundation with 2017 C&C Prize

For major contributions to computer science education, theoretical understanding, and fostering of talent, Alfred Aho was awarded the NEC C&C Foundation C&C Prize.

Regina Barzilay, CS PhD’03, receives MacArthur “genius grant”

A former student of Kathy McKeown and now an MIT professor, Barzilay developed novel solutions for multi-document summarization along with new algorithms for identifying paraphrases. Her work was integrated as part of Columbia’s Newsblaster system.

Women and computer science at Columbia

With the national average slightly below 20%, Columbia’s relatively high percentage of women CS majors in the 2016-2017 academic year ranks it among the top US universities in attracting women to computer science.

Yaniv Erlich receives Young faculty Award from DARPA

The Defense Advanced Research Projects Agency (DARPA) has awarded Yaniv Erlich a Young Faculty Award. The award, which identifies rising research stars in US academic institutions and introduces them to topics and issues of interest to the Department of Defense, will support Erlich’s work on DNA storage technology.

An Assistant Professor of Computer Science and Computational Biology at Columbia University and a Core Member at the New York Genome Center, Erlich does research in many facets of computational human genetics. His lab works on a wide range of topics including developing compressed sensing approach to identify rare genetic variations, devising new algorithms for personal genomics, and using Web 2.0 information for genetic studies.

The award, which is for $1M, is in response to Erlich’s proposal “Resistant and Scalable Storage Using Semi-synthetic DNA,” which describes the use of an extended genetic alphabet to create DNA storage technology that is both immune to a broad range of interception methods and also boosts the information density of DNA storage. The proposal was submitted through Columbia’s Data Science Institute, of which Erlich is also an affiliate.

Erlich’s previous research has earned him several awards, including the Burroughs Wellcome Career Award (2013), Harold M. Weintraub award (2010), and the IEEE/ACM-CS HPC award (2008). In 2010, he was selected as one of Tomorrow’s PIs team of Genome Technology.

Erlich holds B.Sc. in computational neuroscience from Tel-Aviv University and his Ph.D. in genomics and bioinformatics from Watson School of Biological Sciences, Cold Spring Harbor Laboratory in New York.

In February of this year, Erlich was named Chief Science Officer of MyHeritage Ltd.

Posted 10/12/2017
– Linda Crane

Why Science Turned on the DNA Mogul Championing Genetic Privacy

Team from Steven Feiner’s lab wins Grand Prize at NYC Media Lab Summit

“Travel in Large-Scale Head-Worn VR” teleports users through virtual environments by allowing them to determine orientation in advance. Another project from Feiner’s lab, “Remote Collaboration in AR and VR Using Virtual Replicas” won a third place finish.

Making DNA Data Storage a Reality

Nakul Verma joins the department, bringing expertise and experience in machine learning

The number of computer science majors at Columbia is expected to increase yet again this year, driven in part by the exploding interest in machine learning. Among the 10 MS tracks, machine learning is by far the most popular, selected by 60% of the department’s master’s students (vs 12% for the second most popular).

According to Nakul Verma, who joins the department this semester as lecturer in discipline, this interest in machine learning is not likely to abate any time soon. “Machine learning is growing in popularity because it has so much applicability for fields outside computer science. Every application domain is incorporating machine learning techniques, and every traditional model is being challenged by the advent of big data.”

As a PhD student at UC San Diego, Verma gravitated toward machine learning’s theory side, what he sees as fruitful territory. “I love to learn new things, and machine learning theory has this ability to borrow ideas from other fields—mathematics, statistics, and even neuroscience. Borrowing ideas will continually grow machine learning as a field and it makes the field especially dynamic.”

Verma does his borrowing from differential geometry in mathematics, a field he had not previously studied in depth. But as machine learning shifts from strictly linear models to include nonlinear ones, new methods are needed to analyze and leverage intrinsic structure in data.

As examples of nonlinear data sets, Verma cites speech and articulations of robot motions, where data sets are high dimensional, containing many, many observations, each one associated with a potentially high number of features. However, relationships between data points may be fixed in some way so that a change in one causes a predictable change in another, giving the data an intrinsic structure. A robot might focus on key points in a gesture, analyzing how fingers move in relation to one another or to the wrist or arm. These movements are restricted along certain degrees of freedom—by joints, by the positioning of other fingers—suggesting the intrinsic structure of the data is in fact low-dimensional and that the xyz points of these joints form a manifold, or curved surface, in space.

Compressing a manifold surface into lower dimensions while retaining geospatial relationships among data points.

For a way to compress a manifold surface into lower dimensions without collapsing the underlying intrinsic structure, Verma looked to John Nash’s process for embedding manifolds, learning the math—or getting used to it—enough to understand how it could be applied to machine learning. Where Nash worked in terms of equations, Verma is working with actual data so the problem, laid out in Verma’s thesis and other papers by him, is to derive an algorithm from Nash’s technique, one that would work on today’s data sets.

While Verma’s thesis was highly theoretical, it had almost immediate practical applications. For four years at Janelia Research Campus HHMI, Verma helped geneticists and neuroscientists understand how genetics affects the brain to cause different behaviors. Working with fruit flies and other model organisms, researchers would modify certain genes thought to control aggression, social interactions, mating, and other behaviors and then record the organisms’ activity. The scale of data—from the thousands of modifications to the recorded video and audio imagery along with the neuronal recordings—was immense. Verma’s job was to tease out from the pile of data the small threads of how one change affects another, to pinpoint the relationships between the genetic modifications to the brain and the observed behavior. Verma’s work on manifolds and understanding intrinsic structure in data was crucial in developing practical yet statistically sound biological models.

Theory and application go hand in hand in machine learning, and the classes Verma will be teaching will contain a good dose of both, with the exact mix calibrated differently for grads vs undergrads. In either case, Verma sees a solid foundation on basic principles as necessary for understanding how a model is set up or why a certain framework is better than another. “The practical applications help reinforce the theory side of things. Teaching random forests should explain the basic theory but show also how it’s used in the real world. It’s not just some bookish knowledge; it’s one way Amazon and other companies reduce fraud.” Verma talks from experience, having worked at Amazon before going to Janelia.

But teaching has always been his ultimate goal. While at Janelia, Verma was awarded Teaching Fellowship, and last summer taught at Columbia as an adjunct. Says Verma, “Helping students achieve their goals and sharing their excitement for the subject is one of the most rewarding experiences of my academic career.”

Posted 10/02/2017
– Linda Crane

Year: 2017