COMS 6998:
Advanced Topics in Spoken Language Processing
Instructors: Julia Hirschberg
Time:  Tu 4: 10-6:00 (Spring 2023)
Location: Mudd 633
 
 
Prerequisite: COMS
4705 or another speech or NLP class
Description:  This class will introduce students to spoken language
processing:  basic concepts, analysis approaches, and
applications.  Applications include Text-to-Speech Synthesis,
dialogue systems, and analysis of entrainment, empathy, personality, emotion,
humor and sarcasm, deception and trust, radicalization and charisma, all using
text and speech information and some visual features as well.
 
Required readings:
Jurafsky & Martin 2023
(3rd edition draft) chapters
These and other readings are linked from this syllabus for
each class.
Suggested:
Keith
Johnson. Acoustic & Auditory Phonetics (3rd edition). Wiley.  2011.
 
Resources:
A list of resources can be found here.
 
Office Hours
Julia Hirschberg: Th 1-2:30pm
Debasmita Bhattacharya: W 3-5pm
Yu-Wen Chen: F 2-4pm
Ziwei (Sara) Gong: Tu 1-3pm
Grade Breakdown
20% weekly
posts
20% HW1
30% HW2
30% HW3
 
Also please note our late policies:
For weekly posts: 
Monday deadline 11:59pm; 1 late day allowed but 1 point lost
For homeworks: 3 late days allowed
but 5 points lost for each late day
 
Academic Integrity
The SEAS academic integrity policy is found here.
The CS academic integrity policy is found here.
Syllabus
Note: Schedule and readings are
subject to change.  Readings labeled with
* are optional.
 
 
  | Date | Topic | Readings | Assignments | 
 
  | Week 1: 1/17 | Introduction
  to Speech Processing |   |  | 
 
  | Week 2: 1/24 | From
  Sounds to Language | Jurafsky & Martin Chapter 28 (Chapters 1-3) |  | 
 
  | Week 3: 1/31 | Acoustics
  of Speech | Jurafsky & Martin Chapter 28 (sections 4-6) |  | 
 
  | Week 4: 2/7 | Tools
  for Speech Analysis | *Praat Tutorial (just use for reference) Watch
  all these Praat video tutorials here
  (1-7) *Also some video tutorials on acoustics of speech here Download the latest version of Praat Record your own voice saying
  these sentences |   HW1: Praat Recording and
  Analysis (assigned) | 
 
  | Week 5: 2/14 | Analyzing
  Speech Prosody | ToBI Conventions  AuToBI Prosody
  and Meaning *Guidelines
  for ToBI Labeling |  | 
 
  | Week 6: 2/21 | Text-to-Speech
  Synthesis (Rose Sloan, Bard College): This class will be on Zoom. | Jurafsky & Martin Chapter 16 (Introduction, sections
  6, 8) *Prosody Prediction from Syntactic, Lexical, and Word
  Embedding Features, *Comparing
  acoustic and textual representations of previous linguistic context for
  improving Text-to-Speech, *Where do the
  improvements come from in sequence-to-sequence neural tts? | HW1 due | 
 
  | Week 7: 2/28 | Spoken
  Dialogue Systems | Jurafsky & Martin Chapters 14,
  15,
  27 |   | 
 
  | Week 8: 3/7 | Speech Analysis: Entrainment
  and Empathy
  in Spoken Language | Measuring acoustic-prosodic entrainment with respect to
  multiple levels and dimensions Nora
  the Empathetic Psychologist 11
  Nonverbal Ways to Express Empathy And Camaraderie With
  Your Team | HW2 assigned | 
 
  | Week 9: 3/14 | Spring Break: No classes |   |   | 
 
  | Week 10: 3/21 | Speech Analysis: Emotion
  and Sentiment Detection (Zixiaofan Yang, Apple;
  Emotion
  Elicitation, Sara (Ziwei) Gong, Columbia): This
  class will be on Zoom. | Predicting
  Arousal and Valence from Waveforms and Spectrograms using Deep Neural
  Networks Emotions and
  Types of Emotional Responses |   | 
 
  | Week 11: 3/28 | Speech Recognition: 
   Speech
  model personalization and its application to dysarthric speech: the journey
  from research to production (Fadi Biadsy, Google) | Jurafsky & Martin Chapter 16 (Introduction, sections
  1-5, 7-8) Listen,
  Attend and Spell Attention
  is All You Need *Conformer:
  Convolution-augmented Transformer for Speech Recognition *Parrotron:
  An End-to-End Speech-to-Speech Conversion Model and its Applications to
  Hearing-Impaired Speech and Speech Separation *A Scalable Model Specialization
  Framework for Training and Inference using Submodels
  and its Application to Speech Model Personalization | HW2 due | 
 
  | Week 12: 4/4 | Speech Analysis: Personality
  (Michelle Levine, Columbia) and Mental
  State | Predicting
  the Big 5 personality traits from digital footprints on social media: A
  meta-analysis Multimodal
  Deep Learning for Mental Disorders Prediction from Audio Speech Samples Speech
  Processing Approach for Diagnosing Dementia in an Early Stage |  | 
 
  | Week 13: 4/11 | Speech Analysis: Sarcasm
  (Smaranda Muresan, Columbia) and Humor (Lin Ai,
  Columbia) | “Laughing at you or
  with you”: The Role of Sarcasm in Shaping the Disagreement Space "Sure, I did the right thing": A system for sarcasm
  detection in speech "Yeah, right": Sarcasm recognition for spoken
  dialogue systems *Why can’t
  robots understand sarcasm? Multimodal
  Indicators of Humor in Video CHoRaL: Collecting
  Humor Reaction Labels from Millions of Social Media Users | HW3
  assigned | 
 
  | Week 14: 4/18 | Speech
  Analysis: Charisma,
  Likability and Style (Andrew Rosenberg, Google) | What
  Makes a Speaker Charismatic?  Producing
  and Perceiving Charismatic Speech "Would You Buy A Car From
  Me?"-- On the Likability of Telephone Voices Extracting
  Social Meaning: Identifying Interactional Style in Spoken Conversation |   | 
 
  | Week 15: 4/25 | Speech Analysis: Deception and Trust;
  Radicalization (Lin Ai, Columbia) | Acoustic-Prosodic
  and Lexical Cues to Deception and Trust: 
  Deciphering How People Detect Lies Multimodal
  Deception Detection using Automatically Extracted Acoustic, Visual and
  Lexical Features Identifying
  the Popularity and Persuasiveness of Right- and Left-learning
  Group Videos on Social Media | HW3 Due |