Want the attention of the audience?
Introduce a new gesture
Correlating Speaker Gestures in Political Debates with Audience Engagement Measured via EEG, a paper presented earlier this month at ACM Multimedia, describes the work done by Columbia University researchers to identify what gestures are most effective at getting an audience to pay attention. Using EEG data of study participants watching clips of the 2012 presidential debates, researchers found atypical, or extremal gestures to be the strongest indication of listeners' engagement. This finding not only benefits speakers but may lead to methods for automatically indexing video.
When people talk, they often gesture to add emphasis and help clarify what they are saying. People who speak professionally—lecturers, teachers, politicians—know this either instinctively or learn it from experience, and it's supported by past studies showing that particular gestures are used for various semantic purposes.
But what gestures are most effective in getting listeners to pay close attention to what a speaker is saying?
To find out, Columbia University computer science and biomedical engineering researchers motivated by John Zhang, a PhD student of Prof. John R. Kender, set up an experiment to correlate the level of audience interest with specific hand gestures made by a speaker. The researchers asked 20 participants—equally divided along gender and political party lines—to view video clips of the 2012 presidential debates between Obama and Romney. The intent was to track the candidates' gestures with the level of interest exhibited by the audience.
Gestures were automatically extracted from the video using an algorithm that incorporated computer-vision techniques to identify and track hand motions and distinguish left and right hands even when the hands were clasped. The algorithm also detected gestural features the researchers suspected beforehand might correlate with audience attention, specifically velocity but also change in direction and what is called an invitation feature, where the hands are spread in an open manner. (In previous work studying teacher gestures, the authors were the first to identify this type of gesture.)
Another feature was likelihood of hand position. Though gestures vary greatly from person to person, individuals have their own habits. They may move their hands up and down or left and right, but they tend to do within a defined area, or home base, only occasionally moving their hands outside the home base. These non-habitual, or extremal, poses became an additional feature.
From tracking gestures, it was immediately apparent that Romney used his hands more and made more use of extremal gestures. This was true in both debates 1 and 3. (The study did not look at debate 2 where the format—candidates stood and walked while holding a microphone—could bias the gestures.) Obama, widely criticized for his lackluster performance in debate 1, gestured less and made zero extremal gestures in the first debate. If gestures alone decided the debate winner, Obama lost debate 1. (He would improve in the third debate, probably after coaching.)
For gauging the level of interest, researchers chose EEG data since it is a direct, non-intrusive measure of brain activity already known to capture information related to attention. Electrodes were attached to the scalp of the 20 study participants to record brain activity while they watched debate clips. The data capture was carried out in the lab of
While the EEG data showed many patterns of activity, there was a lot of underlying signal, from which researchers identified three main components. Two corresponded to specific areas of the brain: the first to electrodes near the visual cortex, with more activity suggesting that audience members were actively watching a candidate. The second component was generated near the prefrontal cortex, the site of executive function and decision-making, indicating that audience members were thinking and making decisions. (It was not possible to specify a source for the third component.)
Once time-stamped, the EEG data, averaged across subjects, was aligned with gestures in the video so researchers could locate statistically significant correlations between a gesture feature (direction change, velocity, and extremal pose) and a strong EEG component. Moments of engagement were defined as a common neural response across all subjects. Responses not shared with other subjects might indicate lack of engagement (i.e., boredom), as each participant would typically focus on different stimuli.
Extremal gestures turned out to be the strongest indication of listeners' engagement. No matter how researchers subdivided the participants—Democrats, Republicans, females, males, all—what really triggered people's attention was something new and different in the way a candidate gestured.
This finding that extremal poses correlate with audience engagement should help speakers stress important points.
And it may also provide an automatic way to index video, an increasingly necessary task as the amount of video continues to explode. Video is hard to chunk meaningfully, and video indexing today often relies on a few extracted images and standard fast-forwarding and reversing. An algorithm trained to find extremal speaker gestures might quickly and automatically locate video highlights. Students of online courses, for example, could then easily skip to the parts of a lecture needing the most review.
More work is planned. The researchers looked only at correlation, leaving for future work the task of prediction, where some EEG data is set aside to see if it's possible to use gestures to predict where there is engagement. Whether certain words are more likely to get audience reaction is another area to explore. In a small step in this direction, the researchers looked at words with at least 15 occurrences, finding that two words, business and (interest) rate, were unusual in their ability to draw attention. Though the data is insufficient for any definite conclusions, it does suggest potentially interesting results.
Nor did researchers look at what the hand was doing, whether it was pointing, forming a fist, or posed in another manner. The current work focused only on hand position, but this was enough to show that gesturing can effectively engage and influence audiences.
- Linda Crane
About the researchers
John R. Kender is a Professor of Computer Science at Columbia University. His primary research interests are in the use of statistical and semantic methods for navigating through collections of videos, particularly those showing human activities. He received his PhD from Carnegie-Mellon University in 1980, specializing in computer vision and artificial intelligence, and he was the first professor hired in its Robotics Institute. Since 1981, he has been one of the founding faculty of the Department of Computer Science of Columbia University. He was named one of the first National Science Foundation Presidential Young Investigators, and he has served the School of Engineering and Applied Science at Columbia both as Acting Dean of Students and as Vice Dean.
His awards include the Great Teacher Award of the Society of Columbia Graduates, and the Distinguished Faculty Teaching Award of the Columbia Engineering School Alumni Association. He has graduated 25 PhD students, has published well over 200 refereed articles, and holds multiple patents and patent applications in computer vision, video analysis, video summarization, and video browsing.
John Zhang received his PhD in Computer Science in 2013 from Columbia University where his research focused on the significance of gestures in unstructured video and examining the relationship between gestures and semantics. He developed methods to automatically recognize and classify gestures, methods that could be used in tools for efficiently browsing and searching video.
As an undergraduate he attended the University of Calgary where he was awarded a double Bachelor of Science degrees with distinction in Computer Science and Mathematics. He is a recipient of the 2008 International Fulbright Science & Technology Award.
Currently he is employed at Google in NYC.
Paul Sajda is Professor of Biomedical Engineering, Electrical Engineering and Radiology at Columbia University. He is Director of the Laboratory for Intelligent Imaging and Neural Computing (LIINC) and Co-Director of Columbia's Center for Neural Engineering and Computation (CNEC). His research focuses on neural engineering, neuroimaging, computational neural modeling and machine learning applied to the study of rapid decision making in the human brain. Prior to Columbia he was Head of The Adaptive Image and Signal Processing Group at the David Sarnoff Research Center in Princeton, NJ. He received his BS in Electrical Engineering from MIT and his MS and PhD in Bioengineering from the University of Pennsylvania. He is a recipient of the NSF CAREER Award, the Sarnoff Technical Achievement Award, and is a Fellow of the IEEE and the American Institute of Medical and Biological Engineering (AIMBE). He is currently the Editor-in-Chief for the IEEE Transactions in Neural Systems and Rehabilitation Engineering. He has been involved in several technology start-ups and is a co-Founder and Chairman of the Board of Neuromatters, LLC, a neurotechnology research and development company.