COMS 6998:
Advanced Topics in Spoken Language Processing
Instructors: Julia Hirschberg
Time:  Tu 4: 10-6:00 (Fall 2022)
Location: Mudd 1127
 
 
Prerequisite: COMS
4705 or another speech or NLP class
Description:  This class will introduce students to spoken language
processing:  basic concepts, analysis
approaches, and applications. 
Applications include Text-to-Speech Synthesis, dialogue systems, and
analysis of entrainment, personality, emotion, humor and sarcasm, deception and
charisma.
 
Required readings:
Jurafsky & Martin 2021
(3rd edition) chapters
These and other readings are linked from this syllabus for
each class.
Suggested:
Keith
Johnson. Acoustic & Auditory Phonetics (3rd edition). Wiley.  2011.
 
Resources:
A list of resources can be found here.
 
Office Hours
Julia Hirschberg: Th 4-5pm (in CEPSR 705)
Lin Ai:  Tu 2-3pm (zoom)
Run Chen: W 4:30-5:30 (zoom)
Arushi Sahai: F 11:30-2:301-2pm (in CEPSR 7LW3) and on zoom
Grade Breakdown
20% weekly
posts
20% HW1
30% HW2
30% HW3
 
Also please note our late policies:
For weekly posts: 
Monday deadline 11:59pm; 1 late day allowed but 1 point lost
For homeworks: 3 late days allowed
but 5 points lost for each late day
 
Academic Integrity
The SEAS academic integrity policy is found here.
The CS academic integrity policy is found here.
Syllabus
Note: Schedule and readings are
subject to change.  Readings labeled with
* are optional.
 
 
  | Date | Topic | Readings | Assignments | 
 
  | Week 1: 9/6 | Introduction
  to Speech Processing |   |  | 
 
  | Week 2: 9/13 | From
  Sounds to Language | Jurafsky & Martin Chapter 25 (sections 1-3) |  | 
 
  | Week 3: 9/20 | Acoustics
  of Speech | Jurafsky & Martin Chapter 25 (sections 4-6) |  | 
 
  | Week 4: 9/27 | Tools
  for Speech Analysis | Praat Tutorial  Some
  video tutorials:  here
  and a larger set here Download the latest version of Praat | HW1: Praat Recording and
  Analysis (assigned) | 
 
  | Week 5: 10/4 | Analyzing
  Speech Prosody | ToBI Conventions  AuToBI Prosody
  and Meaning *Guidelines
  for ToBI Labeling |  | 
 
  | Week 6: 10/11 | Text-to-Speech
  Synthesis (Rose Sloan) – This class will be remote, on zoom.  Please find the zoom link in Courseworks  Zoom Class Sessions  Tomorrow 4:00pm | Jurafsky & Martin Chapter 26 (Introduction, sections
  6, 8) *Prosody Prediction from Syntactic, Lexical, and Word
  Embedding Features, *Comparing
  acoustic and textual representations of previous linguistic context for
  improving Text-to-Speech, *Where do the
  improvements come from in sequence-to-sequence neural tts? | HW1 due | 
 
  | Week 7: 10/18 | Spoken
  Dialogue Systems | Jurafsky & Martin Chapters 22,
  23,
  and 24 | HW2 assigned | 
 
  | Week 8: 10/25 | Speech Analysis: Charisma,
  Likability and Style (Andrew Rosenberg) | What
  Makes a Speaker Charismatic?  Producing
  and Perceiving Charismatic Speech "Would You Buy A Car From
  Me?"-- On the Likability of Telephone Voices Extracting
  Social Meaning: Identifying Interactional Style in Spoken Conversation |  | 
 
  | Week 9: 11/1 | Speech Analysis: Emotion
  and Sentiment Detection (Zixioafan Yang) and
  Sara (Ziwei) Gong Emotion
  Elicitation --  This class will be
  remote, on zoom.  Please find the zoom
  link in Courseworks  Zoom Class Sessions  Tomorrow 4:00pm | Predicting
  Arousal and Valence from Waveforms and Spectrograms using Deep Neural
  Networks The 6 Types of
  Basic Emotions and Their Effect on Human Behavior | HW2 due | 
 
  | 11/8 | No class: Election Day |   |   | 
 
  | Week 10: 11/15 | Speech Analysis: Personality
  (Michelle Levine) and Mental
  State | Predicting
  the Big 5 personality traits from digital footprints on social media: A
  meta-analysis Multimodal
  Deep Learning for Mental Disorders Prediction from Audio Speech Samples Speech
  Processing Approach for Diagnosing Dementia in an Early Stage |  | 
 
  | Week 11: 11/22 | Speech Analysis: Deception
  and Trust and Radicalization | Acoustic-Prosodic
  and Lexical Cues to Deception and Trust: 
  Deciphering How People Detect Lies Multimodal
  Deception Detection using Automatically Extracted Acoustic, Visual and
  Lexical Features Identifying
  the Popularity and Persuasiveness of Right- and Left-learning
  Group Videos on Social Media | HW3
  assigned | 
 
  | Week 12: 11/29 | Speech Analysis: Entrainment
  in Spoken Language and Empathetic
  Conversations | Measuring acoustic-prosodic entrainment with respect to
  multiple levels and dimensions Nora
  the Empathetic Psychologist 11
  Nonverbal Ways to Express Empathy And Camaraderie
  With Your Team |   | 
 
  | Week 13: 12/6 | Speech
  Analysis: Sarcasm
  (Smaranda Muresan) and Humor
  in CHoRaL (Shayan Hooshmand) and MoreHumor (Lin Ai) | “Laughing at you or
  with you”: The Role of Sarcasm in Shaping the Disagreement Space "Sure, I did the right thing": A system for sarcasm
  detection in speech "Yeah, right": Sarcasm recognition for spoken
  dialogue systems *Why can’t
  robots understand sarcasm? Multimodal
  Indicators of Humor in Video CHoRaL: Collecting
  Humor Reaction Labels from Millions of Social Media Users  | HW3 Due | 
 
  |  |  |  |  |