The Columbia X-Cultural Deception (CXD) Corpus is a collection of within- subject deceptive and non-deceptive speech from native speakers of Standard American English (SAE) and Mandarin Chinese (MC), all speaking in English. The corpus contains 134 conversations between 268 subjects who were not previously acquainted with one another, for a study of acoustic, prosodic, and lexical cues to deception.

The goal of our research is to develop techniques to identify deceptive communication in spoken dialogue. As part of this effort, we are studying how acoustic, prosodic, and lexical features of an individual’s speech can be used, together with knowledge of gender, ethnicity and personality factors, to distinguish deceptive from non-deceptive behavior.