COMS 6998: Advanced Topics in Spoken Language Processing

Instructors: Julia Hirschberg

Time:  Tu 4: 10-6:00 (Fall 2022)

Location: Mudd 1127



Prerequisite: COMS 4705 or another speech or NLP class

Description:  This class will introduce students to spoken language processing:  basic concepts, analysis approaches, and applications.  Applications include Text-to-Speech Synthesis, dialogue systems, and analysis of entrainment, personality, emotion, humor and sarcasm, deception and charisma.


Required readings:

Jurafsky & Martin 2021 (3rd edition) chapters

These and other readings are linked from this syllabus for each class.


Keith Johnson. Acoustic & Auditory Phonetics (3rd edition). Wiley.  2011.



A list of resources can be found here.


Office Hours

Julia Hirschberg: Th 4-5pm (in CEPSR 705)

Lin Ai:  Tu 2-3pm (zoom)

Run Chen: W 4:30-5:30 (zoom)

Arushi Sahai: F 11:30-2:301-2pm (in CEPSR 7LW3) and on zoom

Grade Breakdown

20% weekly posts

20% HW1

30% HW2

30% HW3


Also please note our late policies:

For weekly posts:  Monday deadline 11:59pm; 1 late day allowed but 1 point lost

For homeworks: 3 late days allowed but 5 points lost for each late day


Academic Integrity

The SEAS academic integrity policy is found here.

The CS academic integrity policy is found here.


Note: Schedule and readings are subject to change.  Readings labeled with * are optional.






Week 1: 9/6

Introduction to Speech Processing


Week 2: 9/13

From Sounds to Language

Jurafsky & Martin Chapter 25 (sections 1-3)

Week 3: 9/20

Acoustics of Speech

Jurafsky & Martin Chapter 25 (sections 4-6)

Week 4: 9/27

Tools for Speech Analysis

Praat Tutorial

Some video tutorials:  here and a larger set here

Download the latest version of Praat

HW1: Praat Recording and Analysis (assigned)

Week 5: 10/4

Analyzing Speech Prosody

ToBI Conventions


Prosody and Meaning

*Guidelines for ToBI Labeling

Week 6: 10/11

Text-to-Speech Synthesis (Rose Sloan) – This class will be remote, on zoom.  Please find the zoom link in Courseworks Zoom Class Sessions Tomorrow 4:00pm

Jurafsky & Martin Chapter 26 (Introduction, sections 6, 8)

*Prosody Prediction from Syntactic, Lexical, and Word Embedding Features, *Comparing acoustic and textual representations of previous linguistic context for improving Text-to-Speech,

*Where do the improvements come from in sequence-to-sequence neural tts?

HW1 due

Week 7: 10/18

Spoken Dialogue Systems

Jurafsky & Martin Chapters 22, 23, and 24

HW2 assigned

Week 8: 10/25

Speech Analysis: Charisma, Likability and Style (Andrew Rosenberg)

What Makes a Speaker Charismatic?  Producing and Perceiving Charismatic Speech

"Would You Buy A Car From Me?"-- On the Likability of Telephone Voices

Extracting Social Meaning: Identifying Interactional Style in Spoken Conversation

Week 9: 11/1

Speech Analysis: Emotion and Sentiment Detection (Zixioafan Yang) and Sara (Ziwei) Gong Emotion Elicitation --  This class will be remote, on zoom.  Please find the zoom link in Courseworks Zoom Class Sessions Tomorrow 4:00pm

Predicting Arousal and Valence from Waveforms and Spectrograms using Deep Neural Networks

The 6 Types of Basic Emotions and Their Effect on Human Behavior

HW2 due


No class: Election Day



Week 10: 11/15

Speech Analysis: Personality (Michelle Levine) and Mental State

Predicting the Big 5 personality traits from digital footprints on social

media: A meta-analysis

Multimodal Deep Learning for Mental Disorders Prediction from Audio Speech Samples

Speech Processing Approach for Diagnosing Dementia in an Early Stage

Week 11: 11/22

Speech Analysis: Deception and Trust and Radicalization

Acoustic-Prosodic and Lexical Cues to Deception and Trust:  Deciphering How People Detect Lies

Multimodal Deception Detection using Automatically Extracted Acoustic, Visual and Lexical Features

Identifying the Popularity and Persuasiveness of Right- and Left-learning Group Videos on Social Media

HW3 assigned

Week 12: 11/29

Speech Analysis: Entrainment in Spoken Language and Empathetic Conversations

Measuring acoustic-prosodic entrainment with respect to multiple levels and dimensions

Nora the Empathetic Psychologist

11 Nonverbal Ways to Express Empathy And Camaraderie With Your Team


Week 13: 12/6

Speech Analysis: Sarcasm (Smaranda Muresan) and Humor in CHoRaL (Shayan Hooshmand) and MoreHumor (Lin Ai)

“Laughing at you or with you”: The Role of Sarcasm in Shaping the Disagreement Space

"Sure, I did the right thing": A system for sarcasm detection in speech

"Yeah, right": Sarcasm recognition for spoken dialogue systems

*Why can’t robots understand sarcasm?

Multimodal Indicators of Humor in Video

CHoRaL: Collecting Humor Reaction Labels from Millions of Social Media Users

HW3 Due