COMS 6998: Advanced Topics in Spoken Language Processing

Instructors: Julia Hirschberg

Time:  M 4: 10-6:00 (Spring 2022)

Location: Mudd 1127

 

 

Prerequisite: COMS 4705 or another speech or NLP class

Description:  This class will introduce students to spoken language processing:  basic concepts, analysis approaches, and applications.

Required readings:

Jurafsky & Martin 2021 (3rd edition) chapters

These and other readings are linked from this syllabus for each class.

Suggested:

Keith Johnson. Acoustic & Auditory Phonetics (3rd edition). Wiley.  2011.

 

Resources:

A list of resources can be found here.

 

Office Hours

Julia Hirschberg: Tu 4:30-5:30pm

Lin Ai: W 12-1pm

Run Chen: Th 2-3pm

Zhecan Wang: F 5-6pm

Grade Breakdown

20% weekly posts

20% HW1

30% HW2

30% HW3

 

Also please note our late policies:

For weekly posts:  Sunday deadline 11:59pm; 1 late day allowed but 1 point lost

For homeworks: 3 late days allowed but 5 points lost for each late day

 

Academic Integrity

The SEAS academic integrity policy is found here.

The CS academic integrity policy is found here.

Syllabus

Note: Schedule and readings are subject to change.  Readings labeled with * are optional.

 

Date

Topic

Readings

Assignments

Week 1: 1/24

Introduction to Speech Processing

 

Week 2: 1/31

From Sounds to Language

Jurafsky & Martin Chapter 25 (sections 1-3)

Week 3: 2/7

Acoustics of Speech

Jurafsky & Martin Chapter 25 (sections 4-6)

Week 4: 2/14

Tools for Speech Analysis

Praat Tutorial (Chapter 11 - scripting - is optional)

Download Praat

HW1: Praat Recording and Analysis (assigned)

Week 5: 2/21

Analyzing Speech Prosody

ToBI Conventions

Guidelines for ToBI Labeling

Prosody and Meaning

Week 6: 2/28

Text-to-Speech Synthesis (Rose Sloan)

Jurafsky & Martin Chapter 26 (Introduction, sections 6, 8)

*Prosody Prediction from Syntactic, Lexical, and Word Embedding Features, *Comparing acoustic and textual representations of previous linguistic context for improving Text-to-Speech,

Tacotron: Towards end-to-end speech synthesis (skim)

Where do the improvements come from in sequence-to-sequence neural tts? (* not required)

HW1 due

Week 7: 3/7

Spoken Dialogue Systems

Jurafsky & Martin Chapters 22, 23, and 24

HW2 assigned

Week 8: 3/14

Spring Break: No classes.

 

 

Week 9: 3/21

Speech Recognition (Bhuvana Ramabhadran)

Jurafsky & Martin Chapter 26 (Introduction, sections 1-5, 8)

An Overview of End-to-End Automatic Speech Recognition ?

Conformer Parrotron: a Faster and Stronger End-to-end Speech Conversion and Recognition Model for Atypical Speech

Week 10: 3/28

Speech Analysis: Entrainment in Spoken Language and Empathy

Measuring acoustic-prosodic entrainment with respect to multiple levels and dimensions

Nora the Empathetic Psychologist

11 Nonverbal Ways to Express Empathy And Camaraderie With Your Team

HW2 due

Week 11: 4/4

Speech Analysis: Personality (Michelle Levine) and Mental State

Detecting late-life depression in Alzheimer's disease through analysis of speech and language ?

A Cross-modal Review of Indicators for Depression Detection Systems ?

Automatic Recognition of Personality in Conversation ?

Week 12: 4/11

Speech Analysis: Emotion and Sentiment Detection (Zixioafan Yang)

Predicting Arousal and Valence from Waveforms and Spectrograms using Deep Neural Networks

The 6 Types of Basic Emotions and Their Effect on Human Behavior

HW3 assigned

Week 13: 4/18

Speech Analysis: Sarcasm (Smaranda Muresan) and Humor and CHoRaL(Lin Ai, Shayan Hooshmand)

“Laughing at you or with you”: The Role of Sarcasm in Shaping the Disagreement Space

"Sure, I did the right thing": A system for sarcasm detection in speech

"Yeah, right": Sarcasm recognition for spoken dialogue systems

Why can’t robots understand sarcasm?

Multimodal Indicators of Humor in Video

Week 14: 4/25

Speech Analysis: Deception and Trust

Acoustic-Prosodic and Lexical Cues to Deception and Trust:  Deciphering How People Detect Lies

Multimodal Deception Detection using Automatically Extracted Acoustic, Visual and Lexical Features

 

Week 15: 5/2

Speech Analysis: Charisma, Likability and Style

What Makes a Speaker Charismatic?  Producing and Perceiving Charismatic Speech

"Would You Buy A Car From Me?"-- On the Likability of Telephone Voices

Extracting Social Meaning: Identifying Interactional Style in Spoken Conversation

HW3 due