The goal of this project is to automatically detect deception using acoustic-prosodic and lexical-syntactic cues. We are interested in exploring the factors that play a role in deception and deception detection, such as culture, gender, and personality. Toward that end, we have collected a large corpus of deceptive and non-deceptive speech, comprised of conversations between adult native speakers of American English and of Mandarin Chinese. We are applying machine learning techniques to automatically identify deceptive statements, and exploring individual differences between cultures, genders, and personalities in deceptive behavior.
Collaborators: Julia Hirschberg, Andrew Rosenberg, Michelle Levine, Guozhen An
In conversation, people tend to become similar to their dialogue partner by adopting lexical, acoustic, prosodic, and syntactic characteristics of the interlocutor’s speech. Research shows that this phenomenon, known as entrainment, is associated with task success and dialogue quality. We studied entrainment patterns in the Supreme Court corpus, and examined relationships between trial success and entrainment between lawyers and justices. We used Amazon Mechanical Turk to preprocess the data and excise noisy areas in the audio files that skew the analysis process. We found that lawyers entrain more than justices, supporting the theory that the less dominant interlocutor is more likely to entrain to the more dominant speaker.
Collaborators: Julia Hirschberg, Rivka Levitan
Automatic identification of speaker traits such as gender, age and emotional state from speech is an important problem for personalized speech-driven services. In this work, we present a novel approach that leverages pitch feature trajectories with the goal of identifying the speaker’s gender with as little speech as possible.
We use the f0 (fundamental frequency) trajectory, the most discriminative feature between male and female speech, but instead of computing summary statistics of the f0 trajectory, we use the entire trajectory as input to the classifier. We model these trajectories as “text” input with each token corresponding to the binned f0 value. Our results show that the trajectory approach can be useful for obtaining fairly accurate gender predictions with as little as one second of speech.
Collaborators: Taniya Mishra, Srinivas Bangalore (Interactions LLC)