• Papers &
• Ph.D. Theses
• Speech Lab
• Events & Links
• NLP Group
Spoken Language Processing Group
The Spoken Language Processing Group at Columbia, which was
established by Prof. Julia Hirschberg, includes several doctoral, masters, and undergraduate students. We pursue research in summarization and information extraction from speech, emotional speech (deceptive, charismatic, and uncertain or frustrated) in Broadcast News and in a physics online tutoring domain. We also pursue work in speech generation, particularly in the appropriate assignment of intonational features in text-to-speech.
We collaborate closely with other members of the Columbia NLP Group (headed by
Prof. Kathleen McKeown) and with members of the Center for Computational Learning Systems (CCLS). We also
collaborate and have close research relationships with other
universities and research labs, including AT&T Labs Research, IBM
Research, Northwestern University, SRI International, Tilburg
University, Istituto di Fonetica e Dialettologia del CNR, the University
of Colorado, and the University of Pittsburgh.
We have a laboratory in Schapiro CEPSR 7LW3,
where we perform laboratory studies on human speech production, analyze
speech, and build speech technologies.
Please check out our NSF IGERT PhD Fellowships.
Resources and Facilities
The SLPG has facilities for studio quality audio recording, for video
recording, and for state-of-the-art computing.
Sound and Video
Speech data is collected using a Tascam digital audio recorder and Crown
headworn microphones. Recording is done in a double-walled sound proof
booth generously donated by Agere Systems, through the kindness of Peter
Kroon. Video equipment includes a Hitachi DVD Camcorder.
The group has a recently purchased Sun Fire V210 computing server, and
shares a new Linux computing cluster and multi-terabyte file server with
the NLP Group.
In addition, the lab houses about a dozen Linux and Windows
workstations, most equipped with high quality sound cards.
The group maintains a growing collection of speech corpora and other
databases, collected both at Columbia and elsewhere.