Spoken Language Processing Group

The Speech

The Spoken Language Processing Group at Columbia, which was established by Prof. Julia Hirschberg, includes several doctoral, masters, and undergraduate students. We pursue research in summarization and information extraction from speech, emotional speech (deceptive, charismatic, and uncertain or frustrated) in Broadcast News and in a physics online tutoring domain. We also pursue work in speech generation, particularly in the appropriate assignment of intonational features in text-to-speech.

We collaborate closely with other members of the Columbia NLP Group (headed by Prof. Kathleen McKeown) and with members of the Center for Computational Learning Systems (CCLS). We also collaborate and have close research relationships with other universities and research labs, including AT&T Labs Research, IBM Research, Northwestern University, SRI International, Tilburg University, Istituto di Fonetica e Dialettologia del CNR, the University of Colorado, and the University of Pittsburgh.

We have a laboratory in Schapiro CEPSR 7LW3, where we perform laboratory studies on human speech production, analyze speech, and build speech technologies.


Please check out our NSF IGERT PhD Fellowships.

Resources and Facilities

The SLPG has facilities for studio quality audio recording, for video recording, and for state-of-the-art computing.
  • Sound and Video

    Speech data is collected using a Tascam digital audio recorder and Crown headworn microphones. Recording is done in a double-walled sound proof booth generously donated by Agere Systems, through the kindness of Peter Kroon. Video equipment includes a Hitachi DVD Camcorder.
  • Computing

    The group has a recently purchased Sun Fire V210 computing server, and shares a new Linux computing cluster and multi-terabyte file server with the NLP Group.
    In addition, the lab houses about a dozen Linux and Windows workstations, most equipped with high quality sound cards.
  • Other Resources

    The group maintains a growing collection of speech corpora and other databases, collected both at Columbia and elsewhere.

