An interview with Kathy McKeown: Automatically describing disasters

A deck of playing cards honors 54 notable women in computing. Produced by Duke University and Everwise (in conjunction with CRA-W and Anita Borg Institute Wikipedia Project), the first card decks were distributed at this year’s Grace Hopper conference. (A Kickstarter campaign is raising money for a second printing.)

One of the women to be honored with her own card is Kathy McKeown, a computer scientist working in the field of natural language processing. The first female full-time professor in Columbia’s school of engineering, she was also the first woman to serve as department chair. Currently the Henry and Gertrude Rothschild Professor of Computer Science, she is also the inaugural director of Columbia’s multidisciplinary Data Science Institute. And now she is the seven of clubs.

Question: Congratulations on being included among the Notable Women of Science card deck. What does it feel like to see your picture on the seven of clubs?

Kathy McKeown: It is really exciting to be part of this group, to be part of such a distinguished group of computer scientists.

You started at the Columbia Engineering school in 1982 and were for a time the only full-time female professor in Columbia’s school of engineering. From your experience seeing other women follow soon after, what’s most helpful for women wanting to advance in computer science?

KM: Just having other women around makes a big difference. Women can give one another advice and together support issues that women in particular care about. Having a woman in a senior position is especially helpful. When I was department chair, several undergraduate women approached me about starting a Women in Computer Science group. As a woman, I understood the need for such a group and in my position it was easier for me to present the case.

Of course, getting women into engineering and computer science requires making sure girls in school remain interested in these subjects as they approach college, and I think one way that is done is by showing them how stimulating and gratifying it can be to design and build something that helps others.

Talking of interesting work that helps others, you recently received an NSF grant for “Describing Disasters and the Ensuing Personal Toll.” What is the work to be funded?

KM: The goal is to build a system for automatically generating a comprehensive description of a disaster, one that includes objective, factual information—what is the specific sequence of events of what happened when—with compelling, emotional first-person accounts of people impacted by the disaster. We will use natural language processing techniques to find the relevant articles and stories, and then weave them together into a single, overall resource so people can query the information they need, both as the disaster unfolds or months or years later.

It might be emergency responders or relief workers needing to know where to direct their efforts first; it might be journalists wanting to report current facts or researching past similar disasters. It might be urban planners wanting to compare how two different but similar neighborhoods fared under the disaster, maybe discovering why one escaped heavy damage while the other did not. Or it might be someone who lived through the event coming back in years after to remember what it was like.

It’s not a huge grant—it’s enough for me and two graduate students for three years, but it’s a very appealing one, and it’s one that allows undergraduates to work with us and do research. Already we have seven students working on this, the two funded graduate students and five undergraduates.

It makes sense that journalists and emergency responders would want an objective account of a disaster or event. Why include personal stories?

KM: Because it’s the personal stories that let others understand fully what is going on. Numbers tell one side of the story, but emotions tell another. During Katrina, there were reports of people marooned in the Superdome in horrendous conditions—reports that were initially dismissed by authorities. It wasn’t until reporters went there and interviewed people and put up their stories with pictures and videos that the rest of us could actually see the true plight and desperation of these people, and you realize at the same time it could be you. It wasn’t possible to ignore what was happening after that, and rescue efforts accelerated.

What was the inspiration?

KM: For us it was Hurricane Sandy, since many of us were here in New York or nearby when Sandy struck. One student, whose family on the Jersey Shore was hard-hit, was particularly motivated to look at what could be done to help people.

But more importantly, Sandy as an event has interesting characteristics. It was large scale and played out over multiple days and multiple areas, and it triggered other events. Being able to provide a description of an event at this magnitude is hard and poses an interesting problem. The project is meant to cover any type of disaster—earthquakes, terror attacks, floods, mass shootings—where the impact is long-lasting and generates sub-events.

How will it work?

KM: The underlying system will tap into streaming social media and news to collect information, using techniques of natural language processing and artificial intelligence to find those articles and stories pertinent to a specific disaster and the sub-events it spawns. Each type of disaster is associated with a distinct vocabulary and we’ll build language models to capture this information.

Obviously we’ll look at established news sites for factual information. To include first-person stories, it’s not yet entirely clear where to look since there aren’t well-defined sites for this type of content. We will be searching blogs and discussion boards and wherever else we can discover personal accounts.

For users—and the intent is for anyone to be able to use the system—we envision currently some type of browser-type interface. It will probably be visual and may be laid out by location where things happened. Clicking on one location will present descriptions and give a timeline about what happened at that location at different times, and each sub-event will be accompanied by a personal account.

Newsblaster is already finding articles that cover the same event. Will you be building on top of Newsblaster?

KM: Yes, after all, Newsblaster represents 11 years of experience in auto-generating summaries and it contains years of data, though we will modernize it to include social media, which Newsblaster doesn’t currently do. We also need to also expand the scope of how Newsblaster uses natural language processing. Currently it relies on common language between articles to both find articles of the same event and then to produce summaries. I’m simplifying here, but Newsblaster works by extracting nouns and other important words from articles and then measuring statistical similarity of the vocabulary in these articles to determine which articles cover the same topic.

In a disaster covering multiple days with multiple sub-events, there is going to be a lot less common language and vocabulary among the articles we want to capture. A news item about flooding might not refer directly to Sandy by name; it may describe the flooding only as “storm-related” but we have to tie this back to the hurricane itself even when two articles don’t share a common language. There’s going to be more paraphrasing also as journalists and writers, to avoid being repetitive after days of writing about the same topic, change up their sentences. It makes it harder for language tools that are looking for the same phrases and words.

Determining semantic relatedness is obviously the key, but we’re going to need to build new language tools and approaches that don’t rely on the explicit presence of shared terms.

How will “Describing Disasters” find personal stories?

KM: That’s one question, but the more interesting question is how do you recognize a good, compelling story people would want to hear? Not a lot of people have looked at this. While there is work on what makes scientific writing good, recognizing what makes a story compelling is new research.

We’re starting by investigating a number of theories drawn from linguistics and from literature on what type of structure or features are typically found in narratives. We’ll be looking especially at the theory of the sociolinguist William Labov of the University of Pennsylvania who has been looking at the language that people use when telling stories. There is often an orientation in the beginning that tells you something about location, or a sequence of complicating actions that culminates in an event, which Labov calls the most reportable event—something shocking or involving life and death for which you tend to get a lot of evaluative material. One student is now designing a most-reportable-event classifier but in a way that is not disaster-specific. We don’t want to have to enumerate it for different events, to have to explicitly state that the most reportable event for a hurricane is flooding, and that it’s something different for a tornado or a mass shooting.

What are the hard problems?

KM: Timelines that summarize changes over time, that describe how something that happened today is different from what we knew yesterday. In general that’s a hard problem. On one hand, everything is different, but most things that are different aren’t important. So how do we know what’s new and different that is important and should be included? Some things that happen on later days may be connected to the initial event, but not necessarily. How do we tell?

Being able to connect sub-events triggered by an initial event can be hard for a program to do automatically. Programs can’t see the correlation necessarily. We’re going to have to insert more intelligence for this to happen.

Here’s one example. There was a crane hanging over mid-town Manhattan after Hurricane Sandy came in. If we were using our normal expectations of hurricane-related events, we wouldn’t necessarily think of a crane dangling over a city street from a skyscraper. This crane sub-event spawned its own sub-event, a terrible traffic jam. Traffic jams happen in New York city. How would we know that this one traffic jam was a result of Hurricane Sandy and not some normal everyday event in New York?

It does seem like an interesting problem. How will you solve it?

KM: We don’t know, at least not now. Clearly geographic and temporal information is important, but large-scale disasters can encompass enormous areas and the effects go on for weeks. We need to find clues in language to draw the connections between different events.

We’re just getting started. And that’s the fun of research, the challenge of the difficult. You start with a question or a goal without an easy answer and you work toward it.

There are other problems, of course. The amount of data surrounding a large-scale disaster is enormous. Somehow we’ll have to wade through massive amounts of data to find just what we need. Validation is another issue. How do we know if we’ve developed something that is correct? We’ll need metrics to evaluate how well the system is working.

Another issue will be reining in everything we will want to do. There is so much opportunity with this project, particularly for multidisciplinary studies. We can easily see pulling in journalism students, and those working in visualization. To be able to present the information in an appealing way will make the system more usable to a wider range of people, and it may generate new ways of looking at the data. We see this project as a start; there is a lot of potential for something bigger.

About Kathy McKeown

Kathy McKeown, the Henry and Gertrude Rothschild Professor of Computer Science and inaugural Director of the multidisciplinary Data Science Institute, is a leading scholar and researcher in the field of natural language processing, particularly in the field of text summarization.

With a focus on big data, her research also extends to question answering, natural language generation, multimedia explanation, digital libraries, and multilingual applications. To demonstrate and test new technologies for multi-document summarization, clustering, and text categorization, her research group launched in 2001 the Columbia Newsblaster, an online system for automatically tracking the day’s news. Currently, she leads a large research project involving prediction of technology emergence from a large collection of journal articles.

Education and career highlights

1979
MS, Computer and Information Science, University of Pennsylvania.

1982
PhD, Computer and Information Science, University of Pennsylvania.
Joins Columbia School of Engineering and Applied Science (SEAS).

1989
First woman professor in SEAS to receive tenure.

1998-2003
First woman to serve as department chair.

Awards and recognition

2010
Columbia Great Teacher Award (bestowed by students)
Anita Borg Woman of Vision Award for Innovation

2003
Fellow, Association for Computing Machinery

2000
Outstanding Woman Scientist, New York Association of Women in Science

1994
Fellow, American Association of Artificial Intelligence

1991
NSF Faculty Award for Women

1985
National Science Foundation Presidential Young Investigator Award

Notable papers and publications

Papers relevant to Describing Disasters are listed below. (For a complete list of Kathy McKeown’s publications, go to her webpage.)

Summarizing Disasters Over Time, 2014

Classifying taxonomic relations between pairs of wikipedia articles, 2013

A hierarchical model for web summaries, 2011

Identifying event descriptions using co-training with online news summaries, 2011

A tool for deep semantic encoding of narrative texts, 2009

Columbia Newsblaster: Multilingual News Summarization on the Web, 2004

Tracking and Summarizing News on a Daily Basis with Columbia’s Newsblaster, 2002

eBrevia, which relies on natural-language processing technology developed at Columbia’s Computer Science department, raises $1.5M.

Company’s software quickly extracts information from contracts.

Shree Nayar is interviewed about the ambitious teaching goals of Bigshot,

a make-your-own digital camera that introduces users to concepts in science, math, and engineering.

First-of-its-kind course exposes liberal-arts majors to big ideas in computing.

Developed by Julia Hirschberg and Adam Cannon, the course will be taught by four faculty members.

Want the attention of the audience? Introduce a new gesture

Correlating Speaker Gestures in Political Debates with Audience Engagement Measured via EEG, a paper presented earlier this month at ACM Multimedia, describes the work done by Columbia University researchers to identify what gestures are most effective at getting an audience to pay attention. Using EEG data of study participants watching clips of the 2012 presidential debates, researchers found atypical, or extremal gestures to be the strongest indication of listeners’ engagement. This finding not only benefits speakers but may lead to methods for automatically indexing video.

When people talk, they often gesture to add emphasis and help clarify what they are saying. People who speak professionally—lecturers, teachers, politicians—know this either instinctively or learn it from experience, and it’s supported by past studies showing that particular gestures are used for various semantic purposes.

But what gestures are most effective in getting listeners to pay close attention to what a speaker is saying?

To find out, Columbia University computer science and biomedical engineering researchers motivated by John Zhang, a PhD student of Prof. John R. Kender, set up an experiment to correlate the level of audience interest with specific hand gestures made by a speaker. The researchers asked 20 participants—equally divided along gender and political party lines—to view video clips of the 2012 presidential debates between Obama and Romney. The intent was to track the candidates’ gestures with the level of interest exhibited by the audience.

Gestures were automatically extracted from the video using an algorithm that incorporated computer-vision techniques to identify and track hand motions and distinguish left and right hands even when the hands were clasped. The algorithm also detected gestural features the researchers suspected beforehand might correlate with audience attention, specifically velocity but also change in direction and what is called an invitation feature, where the hands are spread in an open manner. (In previous work studying teacher gestures, the authors were the first to identify this type of gesture.)

Another feature was likelihood of hand position. Though gestures vary greatly from person to person, individuals have their own habits. They may move their hands up and down or left and right, but they tend to do within a defined area, or home base, only occasionally moving their hands outside the home base. These non-habitual, or extremal, poses became an additional feature.

Print — Each point indicates a hand location (sampling was done every 1 second). Curves indicate the probability of a gesture occurring at a specific distance (in pixels) from the candidate’s home base.

From tracking gestures, it was immediately apparent that Romney used his hands more and made more use of extremal gestures. This was true in both debates 1 and 3. (The study did not look at debate 2 where the format—candidates stood and walked while holding a microphone—could bias the gestures.) Obama, widely criticized for his lackluster performance in debate 1, gestured less and made zero extremal gestures in the first debate. If gestures alone decided the debate winner, Obama lost debate 1. (He would improve in the third debate, probably after coaching.)

For gauging the level of interest, researchers chose EEG data since it is a direct, non-intrusive measure of brain activity already known to capture information related to attention. Electrodes were attached to the scalp of the 20 study participants to record brain activity while they watched debate clips. The data capture was carried out in the lab of <http://bme.columbia.edu/paul-sajda’ target=’_blank’>Paul Sajda, a professor in the Biomedical department, who also helped interpret the data. (EEG data is not easy to work with. The signals are weak and noisy, and the capture process itself requires an electrostatically shielded room. The data was also voluminous—46 electrodes for 20 participants tracked for 47 minutes with 2000 data points per second—and was reduced using an algorithm written for the purpose.)

While the EEG data showed many patterns of activity, there was a lot of underlying signal, from which researchers identified three main components. Two corresponded to specific areas of the brain: the first to electrodes near the visual cortex, with more activity suggesting that audience members were actively watching a candidate. The second component was generated near the prefrontal cortex, the site of executive function and decision-making, indicating that audience members were thinking and making decisions. (It was not possible to specify a source for the third component.)

Once time-stamped, the EEG data, averaged across subjects, was aligned with gestures in the video so researchers could locate statistically significant correlations between a gesture feature (direction change, velocity, and extremal pose) and a strong EEG component. Moments of engagement were defined as a common neural response across all subjects. Responses not shared with other subjects might indicate lack of engagement (i.e., boredom), as each participant would typically focus on different stimuli.

Extremal gestures turned out to be the strongest indication of listeners’ engagement. No matter how researchers subdivided the participants—Democrats, Republicans, females, males, all—what really triggered people’s attention was something new and different in the way a candidate gestured.

gestures-atypical — Points far from “home base” correlated with heightened levels of listener attention (here, Romney’s left hand in the first debate).

This finding that extremal poses correlate with audience engagement should help speakers stress important points.

And it may also provide an automatic way to index video, an increasingly necessary task as the amount of video continues to explode. Video is hard to chunk meaningfully, and video indexing today often relies on a few extracted images and standard fast-forwarding and reversing. An algorithm trained to find extremal speaker gestures might quickly and automatically locate video highlights. Students of online courses, for example, could then easily skip to the parts of a lecture needing the most review.

More work is planned. The researchers looked only at correlation, leaving for future work the task of prediction, where some EEG data is set aside to see if it’s possible to use gestures to predict where there is engagement. Whether certain words are more likely to get audience reaction is another area to explore. In a small step in this direction, the researchers looked at words with at least 15 occurrences, finding that two words, business and (interest) rate, were unusual in their ability to draw attention. Though the data is insufficient for any definite conclusions, it does suggest potentially interesting results.

Nor did researchers look at what the hand was doing, whether it was pointing, forming a fist, or posed in another manner. The current work focused only on hand position, but this was enough to show that gesturing can effectively engage and influence audiences.

– Linda Crane

About the researchers

John R. Kender is a Professor of Computer Science at Columbia University. His primary research interests are in the use of statistical and semantic methods for navigating through collections of videos, particularly those showing human activities. He received his PhD from Carnegie-Mellon University in 1980, specializing in computer vision and artificial intelligence, and he was the first professor hired in its Robotics Institute. Since 1981, he has been one of the founding faculty of the Department of Computer Science of Columbia University. He was named one of the first National Science Foundation Presidential Young Investigators, and he has served the School of Engineering and Applied Science at Columbia both as Acting Dean of Students and as Vice Dean.

His awards include the Great Teacher Award of the Society of Columbia Graduates, and the Distinguished Faculty Teaching Award of the Columbia Engineering School Alumni Association. He has graduated 25 PhD students, has published well over 200 refereed articles, and holds multiple patents and patent applications in computer vision, video analysis, video summarization, and video browsing.

John Zhang received his PhD in Computer Science in 2013 from Columbia University where his research focused on the significance of gestures in unstructured video and examining the relationship between gestures and semantics. He developed methods to automatically recognize and classify gestures, methods that could be used in tools for efficiently browsing and searching video.

As an undergraduate he attended the University of Calgary where he was awarded a double Bachelor of Science degrees with distinction in Computer Science and Mathematics. He is a recipient of the 2008 International Fulbright Science & Technology Award.

Currently he is employed at Google in NYC.

Paul Sajda is Professor of Biomedical Engineering, Electrical Engineering and Radiology at Columbia University. He is Director of the Laboratory for Intelligent Imaging and Neural Computing (LIINC) and Co-Director of Columbia’s Center for Neural Engineering and Computation (CNEC). His research focuses on neural engineering, neuroimaging, computational neural modeling and machine learning applied to the study of rapid decision making in the human brain. Prior to Columbia he was Head of The Adaptive Image and Signal Processing Group at the David Sarnoff Research Center in Princeton, NJ. He received his BS in Electrical Engineering from MIT and his MS and PhD in Bioengineering from the University of Pennsylvania. He is a recipient of the NSF CAREER Award, the Sarnoff Technical Achievement Award, and is a Fellow of the IEEE and the American Institute of Medical and Biological Engineering (AIMBE). He is currently the Editor-in-Chief for the IEEE Transactions in Neural Systems and Rehabilitation Engineering. He has been involved in several technology start-ups and is a co-Founder and Chairman of the Board of Neuromatters, LLC, a neurotechnology research and development company.

Month: November 2014