COMS 6998: Advanced Topics in Spoken Language Processing

Instructors: Julia Hirschberg

Time:  M 4: 10-6:00 (Spring 2022)

Location: TBD


Prerequisite: COMS 4705 or another speech or NLP class

Description:  This class will introduce students to spoken language processing:  basic concepts, analysis approaches, and applications.

Required readings:

Jurafsky & Martin 2021 (3rd edition) chapters

These and other readings are linked from this syllabus for each class.


Keith Johnson. Acoustic & Auditory Phonetics (3rd edition). Wiley.  2011.



A list of resources can be found here.


Grade Breakdown

10% attendance and participation (not required for CVN students)

20% weekly posts

20% HW1

25% HW2

25% HW3

Academic Integrity

The SEAS academic integrity policy is found here.

The CS academic integrity policy is found here.


Week 1: 1/24

Introduction to Speech Processing


Week 2: 1/31

From Sounds to Language

Jurafsky & Martin Chapter 25 (sections 1-3)

Week 3: 2/7

Acoustics of Speech

Jurafsky & Martin Chapter 25 (sections 4-6)

Week 4: 2/14

Tools for Speech Analysis

Praat Tutorial (Chapter 11 - scripting - is optional)

Download Praat

HW1: Praat Recording and Analysis (assigned)

Week 5: 2/21

Analyzing Speech Prosody

ToBI Conventions

ToBI Tutorial

Prosody and Meaning

Week 6: 2/28

Text-to-Speech Synthesis (Rose Sloan)

Jurafsky & Martin Chapter 26 (Introduction, sections 6, 8)

Prosody Prediction from Syntactic, Lexical, and Word Embedding Features, Comparing acoustic and textual representations of previous linguistic context for improving Text-to-Speech,

Tacotron: Towards end-to-end speech synthesis

Where do the improvements come from in sequence-to-sequence neural tts?

HW1 due

Week 7: 3/7

Spoken Dialogue Systems (Run Chen)

Jurafsky & Martin Chapters 22, 23, and 24


Week 8: 3/14

Spring Break: No classes.



Week 9: 3/21

Speech Recognition (Bhuvana Ramabhadran)

Jurafsky & Martin Chapter 26 (Introduction, sections 1-5, 8)

An Overview of End-to-End Automatic Speech Recognition ?

Conformer Parrotron: a Faster and Stronger End-to-end Speech Conversion and Recognition Model for Atypical Speech


Week 10: 3/28

Speech Analysis: Entrainment in Spoken Language and Empathy

Measuring acoustic-prosodic entrainment with respect to multiple levels and dimensions

Nora the Empathetic Psychologist

11 Nonverbal Ways to Express Empathy And Camaraderie With Your Team

Week 11: 4/4

Speech Analysis: Personality (Michelle Levine) and Mental State

Detecting late-life depression in Alzheimer's disease through analysis of speech and language ?

A Cross-modal Review of Indicators for Depression Detection Systems ?

Automatic Recognition of Personality in Conversation ?

HW2 due

Week 12: 4/11

Speech Analysis: Emotion and Sentiment Detection

Predicting Arousal and Valence from Waveforms and Spectrograms using Deep Neural Networks

The 6 Types of Basic Emotions and Their Effect on Human Behavior

HW3: Emotional Speech Detection (assigned)

Week 13: 4/18

Speech Analysis: Sarcasm and Humor (Lin Ai)

“Laughing at you or with you”: The Role of Sarcasm in Shaping the Disagreement Space

"Sure, I did the right thing": A system for sarcasm detection in speech

"Yeah, right": Sarcasm recognition for spoken dialogue systems

Why can’t robots understand sarcasm?

Multimodal Indicators of Humor in Video

Week 14: 4/25

Speech Analysis: Deception and Trust

Acoustic-Prosodic and Lexical Cues to Deception and Trust:  Deciphering How People Detect Lies

Multimodal Deception Detection using Automatically Extracted Acoustic, Visual and Lexical Features


Week 15: 5/2

Speech Analysis: Charisma, Likability and Style

What Makes a Speaker Charismatic?  Producing and Perceiving Charismatic Speech

"Would You Buy A Car From Me?"-- On the Likability of Telephone Voices

Extracting Social Meaning: Identifying Interactional Style in Spoken Conversation

HW3 due