Computer Science COMSE6998_006_2015_1:
Seminar on Spoken Dialogue Systems
Spring 2015
[Announcements] [General Information]
[Schedule] [Class Presentation]
[Project] [Links and Resources]

Announcements
  • 5/4: Contratulations on winning the best demo prize to the FOOD SPEAK team (Annanya, Leighanne, and Mandi)!
  • 3/20: Think of where you might submit your work. A demo track option: Sigdial demontstration (deadline is April 30); research track EMNLP, short paper June 15
  • 3/9: Related work and methods descriptions submission deadline is extended to 3/13.
  • 2/8: System descriptions are due 2/23. More details are added in the deliverables for the system descriptions.
  • 1/26: Class cancelled due to snow.
  • First Class January 26. Welcome to the Seminar on Dialogue Systems!
[Top]

General Information

Description

Seminar on Dialogue Systems introduces students to research on automatic spoken dialogue systems. Dialogue systems process spoken (or typed) user input and respond to the user in natural language. Dialogue systems are used commercially to process customer requests, such as providing train schedule information, customer service for banks and stores, etc. Dialogue systems running on personal/home devices, like Apple’s Siri or Amazon’s Echo, answer open-ended questions. In this course, we will discuss state-of-the-art research on dialogue systems and their components: speech recognition, language understanding, dialogue management, and natural language generation. We will cover different types of dialogue systems, including information providing systems, task-oriented systems, tutoring systems, and multimodal systems. In the course project, the students will have a chance to build their own dialogue system. Classes will be lecture and discussion with an emphasis on group participation. There will be no final exam in this course.

Prerequisites

  • COMS 3133/4/7/9 (Data Structures) or equivalent programming ability in at least one systems or scripting language ( Java, Python)

Instructor: Svetlana Stoyanchev

  • sstoyanchev [who is at] cs [dot] columbia [dot] edu
  • Office Hours: Mondays 2-4 CEPSR 7LW3 (speech lab) or by appointment on Skype
  • Skype ID: svetastenchikova

TA: Victor Soto

  • vsoto [who is at] cs [dot] columbia [dot] edu
  • Office Hours: Tuesdays 4-6 CEPSR 7LW3 (speech lab)

Lectures

  • Mondays 4:10-6:00, Mudd Building Room 545

Grade Breakdown

  • 20% Class participation
  • Pelase prepare 3 questions/comments for each of the required papers for the week and email them to the TA and instructor before the class. These could be questions about aspects of the research that were interesting or unclear to you, or comments you have on the methodology or results in the work.
  • 30% Class presentation
  • Each student will do a 20-30 minute presentation of one or two papers and lead a critical discussion on key positive and negative aspects. Each student will prepare a list of her/his top five choices of the "discussion" papers and email it to the TA (CC instructor) by Feb 2. We will try to take into account your preferences, so it is important you email your preferences.
  • 50% Course Project
    • Project Description: 5%;
    • Related work and Methodology: 5%
    • In-class demonstration : 20%;
    • Final paper draft (including experiments and evaluation): 20%
[Top]

Reading schedule

Schedule is tentative and highly subject to change.

Discussion papers:
Date Topics Readings Due dates
1/26 Class Introduction Lecture slides Explore links and videos in the Course Resources
2/2 Task-oriented dialogue systems:information-providing, troubleshooting, making reservations Background reading:
  • Pier Lison's thesis Chapter 2
Discussion papers: Additional papers:

Create account on wit.ai, go through Quickstart guide;
Send a list of 5 preferred papers for your presentation
2/9 Discussion: Speech Recognition and Language Undersanding for Dialogue Systems; Lecture slides (presented on 2/2)

Lecture: Dialogue Modelling and Management
Background papers: Discussion papers: Additional papers: Download and install OpenDial; go through the tutorial.
2/16 Discussions: Dialogue Modelling and Management Lecture slides

Background papers: Discussion papers: Additional Papers: Find a team ( email the instructor); Make an appointment with the TA or instructor to discuss your project ideas;

2/23 Information Presentation in Dialogue slides (Frameworks summary) Discussion papers: students' questions Additional papers: Project Descriptions Due; Each team gives a 5-minute elevator speech about project in class.
3/2 Evaluation of Dialogue Systems SDSEval_03_02_15.pptx students' questions Background Reading:
  • H.Hastie (2012) Metrics and Evaluation of Spoken Dialogue Systems. In Data-Driven Methods for Adaptive Spoken Dialogue Systems Computational Learning for Conversational Interfaces. Oliver Lemon and Olivier Pietquin (Editors). (available for download at Columbia library)
  • Marilyn A. Walker, Candace Kamm and Diane J. Litman. Towards Developing General Models of Usability with PARADISE. Natural Language Engineering, 2000.
Discussion papers: Additional papers:
Create a github repository for your team and send git repo URL to the instructor and TA
3/9 Error recovery in dialogue systems (Lecture slides) (Student's questions) Background papers: Discussion papers: Additional papers:
Related work and method description due
3/16 Spring Break
3/23 Tutoring dialog systems; adaptation in dialogue systems; Students' questions; discussion on frameworks OpenDial
Discussion papers: Additional papers:
3/30 Entrainment in dialogue Invited speaker: Rivka Levitan, Brooklyn College CUNY.
Background Reading (please read and submit questions for at least 1 of these papers): Discussion papers (please read and submit questions for at least 2 of these papers):

4/6 Search and Dialogue; Voice search; Question answering Guest Speaker: David Elson, Google
Background papers: Discussion papers: Additional Papers:
Make an appointment with the instructor or TA to show demo progress and discuss your project;
4/13 Domain Knowledge Acquisition; Situated Dialogue Systems; Dialogue with Robots Guest lecture: Aasish Pappu, Yahoo labs
Background Reading:
Discussion papers: Additional papers:
4/20 Statistical dialogue systems; Dialogue system competitions: Dialog State Tracking Challenge; REAL challenge Guest lecture: Sungjin Lee, Yahoo Labs.
Required Reading: Discussion papers:
4/27 Multimodal dialogue systems Guest Speaker: Michael Johnston, Interactions Corporation
Background papers: Discussion papers:


5/4 In-class Project Demonstrations Final slides
5/15 Final Draft Due (no extensions)
[Top]

Course Presentation


Prepare 10 - 20 slides to present content of the paper you are assigned to present. Short papers will have 10-15 minutes and long papers will have 20-30 munutes presentations.
  • describe the task addressed by the paper
  • approach proposed by the authors
  • data or system used
  • summarize the related work
  • describe the experiments and results
Prepare critical analysis of the paper using a research review form:
  • Clarity
  • Originality
  • Implementation and soundness
  • Substance
  • Evaluation
  • Meaningful Comparison
  • Impact
  • Recommendation
The presentation sessions will be followed by a panel discussions with the presenters as panelists. Deliverables: 1) send a link to your slides before the class 2) write a paragraph on each point of the critical analysis 3) present the slides in the class, 4) lead a discussion on critical analysis of the paper
[Top]

Project


The project will involve 1) building a spoken dialogue system (SDS) in a domain of your choice; 2) proposing a research question (optionally); and 3) evaluating your system/research question. You may use one of the existing frameworks (e.g. OpenDial, WitAI). You may use any architecture/platform that you are familiar with: a stand-alone application, a web application, an android, or an i-phone app. The TA and Instructor will provide support with OpenDial java framework and with WitAI using python API (which we found to work with Linux but having installation issues with a Mac).
The SDS should allow the user to ask questions related to your domain. You can structure the interactions you support however you want. You should design your system by first deciding what types of user interactions you will support and then creating a Dialog Flow Diagram, a directed graph showing the System States (e.g. Greeting, Help, Info-request, Exit) as nodes with arcs showing which state can follow which other state (E.g. Greeting can be followed by Info-request or Help or Exit, so there should be arrows connecting the Greeting node to each of these 3). You will identify these states in your SDS by considering the current state, the legal states this state is connected to in the graph, and the user input.
Your SDS should handle multi-turn dialogues (in contrast with a single-turn search request or Q&A).
You should build your application with an idea of evaluation. What will make this application successful?
Your final report will be written in a form of a research paper and will include sections:
  • Introduction
  • Related Work
  • Method
  • Experiments
  • Conclusions
Project deliverables will include 1) System description, 2) Related Work and Method description, 3) System demonstration, and 4) Final paper draft. Each deliverable will contribute towards your final paper.
Project report deliverables are submitted by each team using CourseWorks website. Please submit the documents in PDF format. Please include a section with team member names and contribution made by each team member.
The code and running instructions will be kept in a github repository. The instructor and TA will be monitor your progress by occasionally pulling your current version of the code. Your project grade will include how readable and well-documented is your code. Include documentation with the instructions for running your system.

1. System Description
This report will form an initial draft of the Introduction section of your final report. Describe the domain that you choose for the system; which system architecture you intend to use; what is a dialogue flow; example dialogue with a system.

Your system should engage in an interactive dialogue (i.e. more than one turn of question/answer) and support multiple (at least 4 - 5) domain concepts (types of things that users can ask about and that your system will need to recognize in order to respond appropriately). For example, a flight arrival system might allow users to specify (1) an airline, (2) a departure city, (3) an arrival city, and (4) a flight number -- although users might specify only 2 or 3 in any given query. A restaurant domain system might have nieghborhood, price range, star rating, and cousine.


Example topics for the project include (but not limited to):
  • A voice interface for an existing API:
    • A calendar system that interfaces a google calendar and allows a user add/remove/query events in the calendar
    • A system that queries weather information
    • A system that holds a dialogue questions about current events in NYC: find concerts/plays/movies at NYC venues
    • Voice Interface for a travel api, e.g tripadvisor that allows to query hotels
    • Google search API
    • A text base (you can run an indexing, such as Lucene on a text collection and use API to search)
  • A chatbot system that uses a database on a back-end e.g.
    • a chat interface for a toy that talks with children
    • a chat interface that may be used in a museum to provide information for visitors

Deliverables: 1) 5 minute in-class presentation of the project idea. 2) 1-2 page desciption of the proposed dialogue system (Introduction part of the paper). The description should include:
  • Motivation for your choice of domain in terms of interestingness, utility, or challenge.
  • System functionality: describe the type of input you will accept from users, the domain concepts in your system, and the type of output you want to produce in as much detail as possible.
  • Front-end: what architecture will the system run on (stand-alone, i-phone, android)
  • Back-end: identify where you will get the data for the system (which API, database, ontology)
  • Framework: which framework do you intend to use for the system (if any), which ASR, TTS will you use, how will you implement NLU for your system.
  • Research or demo: is your project focusing on system development or system & research? What research questions do you intend to address
  • Evaluation: how do you plan to evaluate your system/research question (user study)?
The project will have two options: 1) focus on developing an SDS system 2) focus on a research question. In both cases, you will be developing a system. Indicate in your submission what is the focus of your project.
Research focus: Describe a research question you indend to investigate using your system. Which experimental conditions will you investigate? (having 2 experimental conditions is reasonable). What is your hypothesis (which experimental condition do you expect to work best and why)? How will you evaluate your hypothesis?
Both system & research : Summarize the related work. Describe evaluation procedure. For example, you may propose to evaluate your experimental system by having 10 people use the system and answer a questionnaire or measure speech recognition/language understanding performance of the system. Possible research questions include (but are not limited to):
  • Compare system performance using different speech recognisers (e.g. Kaldi, Pocket Sphinx, Google, Nuance)
  • Compare system performance with different TTS engines (Festival)
  • Build a statistical NLU for the OpenDial system. ** (this has a practical application), or try connecting WitAI NLU as a module for OpenDial
  • Build a multimodal graphical display for your system (e.g. as a module in OpenDial) and compare voice-only and multimodal condition
  • Experiment with clarification strategies in your system
  • Experiment with different methods of information presentation or natural language generation

See an example project description from a related course.

2. Related Work and Method Description
Finalize your choice of system architecture. Think through the design of your system. Test proposed tools to ensure the feasibility of your proposal. At this point we need to make sure that whether you are using Wit.AI and OpenDial, the framework of your choice will be enough to support all the functionality you are proposing to implement in your system. If you are going to use a third-party API in the backend of your system, it is very important too, that you understand what the input and output of the API is and what you will be able to do with it.

Deliverables: A report with refined Introduction, a draft of Related Work and Method sections. The report should show that you thought through the system design and functionality and confirmed feasibility of your approach by trying out the functionality of the proposed tools. Evaluation criteria:

  • Clearly describe your motivation and goals.
  • Compare your proposed system with previous work in academia and/or industry.
  • Describes the design for your proposed system/experiments.
  • What is the system functionality from a user perspective (what tasks can a user accomplish using the system).
  • Implementation details: Which ASR/NLU/TTS components is your system using? How will you train/create rules for NLU in your domain.
  • Describe third-party APIs that you will be using in your system, if any.
  • A high-level design of a dialogue manager: what will be a set of states and/or frames.
  • Motivation for your choices of tools/infrastructure ( sure what you are proposing is feasible)
  • Describe how your system accesses the data, what intermediate representations are you using?
  • Describe your evaluation plan.


3. System Demonstration
Develop a functional dialogue system. This system will be used for your experiments.
Deliverables: Please prepare a demonstration of your system to be shown during the class (see the schedule). Each group will have up to 20 minutes to present. I recommend preparing ~15 minute presentation with 5 minutes for questions. After the presentations, everyone can vote for the best presentation. There will be time to play with other teams' systems. Please submit your documented source code with README describing content on your github.

6. Final Report
Recruit experiment participants (from the class or outside the class) who will be users of your system. Run an evaluation of your system/experimental condition. Analyze and summarize the results in Evaluation section. In Conclusions section summarize your system/what you learned? what would you do differently? how would you extend this work?
Deliverables: Submit a complete paper draft that includes your improved Introduction, Related Work, Method sections and new Evaluation and Conclusion sections. It will be graded by the TA and the instructor according to the:
  • Clarity (overall clarity of your paper)
  • Implementation and soundness (method description)
  • Substance (how much work is accomplished)
  • Evaluation
  • Meaningful comparison (related work)
  • Impact of accompanying software (we will look at your github source code and evaluate for potential reusability)
In addition to the paper, please also include an Admin and a Framework evaluation sections with the following information:
Admin Section
  • URL of your github.
  • URL of your final report if you agree for it to be published on the course webpage
  • If you are planning to submit your work to a workshop, please indicate which workshop (I will be glad to help with your paper draft)
  • For each group member, a paragraph description of the project tasks accomplished
Framework evaluation section (OpenDial/WitAI)
  • What are the strengths and weaknesses of the framework you were using?
  • On a scale from 1 (worst) – 10 (best) How easy was it to start using the framework? How easy was it to debug your code?
  • What are the challenges that you discovered while using the framework?
  • For Wit.ai – how much training data did you need to gather to make it usable?
  • For OpenDial – how much effort did it take to build NLU/DM/NLG/external components?
  • What suggestions would you make for the authors of the framework to improve it?
  • What extensions would you suggest for the framework?
  • Would you use this framework for another dialogue system?
[Top]

Links and Resources


Course slides can be found in Dropbox

Research-related:
Dialogue systems-related websites and articles: Student workshops for potential paper submission: Videos and talks: Sample APIs for projects (back-end): Frameworks and tools for speech-enabled application development:
  • Commersial API (free for research purposes) that provides ASR/NLU capabilities: Wit.ai
  • Research Dialog Framework that provides Dialog management capabilities: OpenDial
  • SOX: for recording audio. "brew install sox" on OSX and "apt-get install sox" on ubuntu.
Other:





svetlana.stoyanchev [ at ] gmail [dot] com

Design adapted from David Elson's/Smaranda Muresan site design