Want the attention of the audience? Introduce a new gesture

Correlating Speaker Gestures in Political Debates with Audience Engagement Measured via EEG, a paper presented earlier this month at ACM Multimedia, describes the work done by Columbia University researchers to identify what gestures are most effective at getting an audience to pay attention. Using EEG data of study participants watching clips of the 2012 presidential debates, researchers found atypical, or extremal gestures to be the strongest indication of listeners’ engagement. This finding not only benefits speakers but may lead to methods for automatically indexing video.
When people talk, they often gesture to add emphasis and help clarify what they are saying. People who speak professionally—lecturers, teachers, politicians—know this either instinctively or learn it from experience, and it’s supported by past studies showing that particular gestures are used for various semantic purposes.
But what gestures are most effective in getting listeners to pay close attention to what a speaker is saying?
To find out, Columbia University computer science and biomedical engineering researchers motivated by John Zhang, a PhD student of Prof. John R. Kender, set up an experiment to correlate the level of audience interest with specific hand gestures made by a speaker. The researchers asked 20 participants—equally divided along gender and political party lines—to view video clips of the 2012 presidential debates between Obama and Romney. The intent was to track the candidates’ gestures with the level of interest exhibited by the audience.
Gestures were automatically extracted from the video using an algorithm that incorporated computer-vision techniques to identify and track hand motions and distinguish left and right hands even when the hands were clasped. The algorithm also detected gestural features the researchers suspected beforehand might correlate with audience attention, specifically velocity but also change in direction and what is called an invitation feature, where the hands are spread in an open manner. (In previous work studying teacher gestures, the authors were the first to identify this type of gesture.)
Another feature was likelihood of hand position. Though gestures vary greatly from person to person, individuals have their own habits. They may move their hands up and down or left and right, but they tend to do within a defined area, or home base, only occasionally moving their hands outside the home base. These non-habitual, or extremal, poses became an additional feature.
Each point indicates a hand location (sampling was done every 1 second). Curves indicate the probability of a gesture occurring at a specific distance (in pixels) from the candidate’s home base.
From tracking gestures, it was immediately apparent that Romney used his hands more and made more use of extremal gestures. This was true in both debates 1 and 3. (The study did not look at debate 2 where the format—candidates stood and walked while holding a microphone—could bias the gestures.) Obama, widely criticized for his lackluster performance in debate 1, gestured less and made zero extremal gestures in the first debate. If gestures alone decided the debate winner, Obama lost debate 1. (He would improve in the third debate, probably after coaching.)
For gauging the level of interest, researchers chose EEG data since it is a direct, non-intrusive measure of brain activity already known to capture information related to attention. Electrodes were attached to the scalp of the 20 study participants to record brain activity while they watched debate clips. The data capture was carried out in the lab of <http://bme.columbia.edu/paul-sajda’ target=’_blank’>Paul Sajda, a professor in the Biomedical department, who also helped interpret the data. (EEG data is not easy to work with. The signals are weak and noisy, and the capture process itself requires an electrostatically shielded room. The data was also voluminous—46 electrodes for 20 participants tracked for 47 minutes with 2000 data points per second—and was reduced using an algorithm written for the purpose.)
While the EEG data showed many patterns of activity, there was a lot of underlying signal, from which researchers identified three main components. Two corresponded to specific areas of the brain: the first to electrodes near the visual cortex, with more activity suggesting that audience members were actively watching a candidate. The second component was generated near the prefrontal cortex, the site of executive function and decision-making, indicating that audience members were thinking and making decisions. (It was not possible to specify a source for the third component.)
Once time-stamped, the EEG data, averaged across subjects, was aligned with gestures in the video so researchers could locate statistically significant correlations between a gesture feature (direction change, velocity, and extremal pose) and a strong EEG component. Moments of engagement were defined as a common neural response across all subjects. Responses not shared with other subjects might indicate lack of engagement (i.e., boredom), as each participant would typically focus on different stimuli.
Extremal gestures turned out to be the strongest indication of listeners’ engagement. No matter how researchers subdivided the participants—Democrats, Republicans, females, males, all—what really triggered people’s attention was something new and different in the way a candidate gestured.
Points far from “home base” correlated with heightened levels of listener attention (here, Romney’s left hand in the first debate).
This finding that extremal poses correlate with audience engagement should help speakers stress important points.
And it may also provide an automatic way to index video, an increasingly necessary task as the amount of video continues to explode. Video is hard to chunk meaningfully, and video indexing today often relies on a few extracted images and standard fast-forwarding and reversing. An algorithm trained to find extremal speaker gestures might quickly and automatically locate video highlights. Students of online courses, for example, could then easily skip to the parts of a lecture needing the most review.
More work is planned. The researchers looked only at correlation, leaving for future work the task of prediction, where some EEG data is set aside to see if it’s possible to use gestures to predict where there is engagement. Whether certain words are more likely to get audience reaction is another area to explore. In a small step in this direction, the researchers looked at words with at least 15 occurrences, finding that two words, business and (interest) rate, were unusual in their ability to draw attention. Though the data is insufficient for any definite conclusions, it does suggest potentially interesting results.
Nor did researchers look at what the hand was doing, whether it was pointing, forming a fist, or posed in another manner. The current work focused only on hand position, but this was enough to show that gesturing can effectively engage and influence audiences.

– Linda Crane