CS4706 Spring 2011 - Project Description - "Propose a Domain for a Spoken Dialogue System"

In this assignment, you are going to propose a domain and functionality for a Spoken Dialogue System (SDS).  This system should access an existing data source or service (e.g. Google mail, the Columbia online library services, an airline status site) or a database or service you will provide. Your SDS should allow users to query this data or interact with the service and provide responses to their input.  The system should take as input spoken utterances recognized by an Automatic Speech Recognition (ASR) system that you will build from libraries we provide.  You will build a component to analyze this transcribed speech and produce a semantic representation of it in table form to represent the user input.  You will build a component to interpret this representation and use it to query the database in your domain.  You will build a component to transform the result of the query into a text response, which you will feed to a Text-to-Speech (TTS) system that you will build, again using libraries we provide.  Your SDS should also handle greetings and salutations as well as questions about system capabilities and help requests.  It should be able to ask for additional input from the user as needed.  A sample project description is attached here.

A. Choosing a domain

Choose a domain that is interesting to you and that you think might be interesting or useful to others.  Previous systems built in this class include:  a system to provide information on cancer diagnoses to patients; a system to describe events on a user's Google Calendar to him/her; a system to find book locations in Columbia's libraries; a system to provide results of sporting events; a system to provide weather information.  There are many ideas e.g. for voice apps for phones (see Google Voice Apps for the Android).  Motivate your choice of domain in terms in terms of interestingness, utility, or challenge. 

You should describe the functionality you want your system to have:  What kind of queries will it accept?  What kind of information will it provide?  Your domain must have at least 5 degrees of freedom.  That is, your system should be able to handle queries that include at least 5 different types of information, although they need not be able to recognize all 5 types of information from a single query.  For example, a flight arrival system might allow users to specify (1) an airline, (2) a departure city, (3) an arrival city, and (4) a flight number -- 4 degrees of freedom -- although users might specify only 2 or 3 in any given query.   Conversely, if you have a restaurant location system for which users can only specify nieghborhood, price range, and star rating, that would only allow for 3 degrees of freedom in a query.  You would have to expand your system to handle, e.g., queries about cuisine type or accessibility to support the target 5 degrees of freedom.  We will refer to these types of information as domain concepts -- types of things that users can ask about and that your system will need to recognize in order to respond appropriately.

B. Designing the input and output of your SDS system

Define as clearly as possible what the input and output of your SDS system might look like.  You do not have to figure out all the details here, but give it considerable thought.  The more you can describe your plans at this stage, the better we will be able to help you to see whether your system is feasible or not.  Describe the type of input you will accept from users, the domain concepts in your system, and the type of output you want to produce in as much detail as possible.  Identify where you will get the database for your system and how you will access or create that data.  For example, suppose your (very simple) system finds flight departure information for Continental flights arriving in EWR.  You will allow users to specify flights by date, flight number, time, and/or destination city.  Your TTS system will have to prompt the user for information and then extract the answer from user input.  You will have to decide whether to prompt for one piece of information at a time or whether users can decide which information to present and when. Your system will output the estimated departure time.  You might decide that you will also provide the reason for any delays.  Also, if the user does not specify sufficient information for you to be able to find the departure time of the flight (e.g. doesn't specify the date), you will need to recognize this and prompt for the additional information.

You should provide a Concept Table for your system, showing the domain concepts users can provide in their input and a sample of the values these concepts can take on (later you will be asked to specify these values exhaustively).  For example, a Concept table for the system described above might be:

Destination city

 

Flight#

  

Date

 

Time

 

 

Sample values these concepts can take on might be:

 

Destination city: {Dallas, Boston, Baltimore, Kansas City, Washington, Pittsburg, Philadelphia,...}

Flight #: {0-9|0-9|0-9|0-9}

Date: {January, February,...,December} {1-31}

Time: {Morning, Afternoon, Evening}

 

Sample Query: "When does Continental flight 3090 leave for for Kansas City?"

 

You will need to extract the following information for the Concept Table:

 

Destination city

Kansas City

Flight#

3092  

Date

 

Time

 

 

Your system cannot look up this flight without a date (unless you decide to assume that the user means 'today' unless specified), so you would want to either confirm that assumption "I assume you mean today's flight?" or request more information "What date are you interested in?"

You should also think about the difficulties that may be involved in your choice of domain/functionality.  For example, if your system allows users to ask for concepts with too many possible instantiations, it may be hard to recognize them accurately; you may need to limit your coverage or choose another domain.  A system that provides airline arrival information for any flight in the U.S. is going to require accurate recognition of far too many cities for your ASR system to handle, but a system that handles flights by number for one airport and a few airlines may work fine.  You will need to experiment to find out how many possible vocabulary items you can handle as you build your ASR system, but think now about how to limit your functionality if you need to. 

C. Team members and roles

List team members and any particular qualifications they have.  It will be very useful to include someone who has a linguistics of NLP background on your team, as well as people with expertise in simple system building and web apps and people who are good at planning and system design.  However, note that the first prize system in the 2010 class was built by one person with little experience but who was willing to work hard and learn.  Please also check over the instructions for Parts1, 2 and 3 of the project, from the syllabus pages, so that you have a better idea about what will need to be done.

For each team member, list the project tasks for which they will be responsible.  Tasks might include 'find an online database' or 'recording your TTS voice' or 'python scripting' and so on.  These tasks are likely to evolve over the course of the project, but this is a good way to start thinking about the concrete steps your team must take to complete the project and to ensure that the work load is balanced fairly. 

C. Example project descriptions

Virtual Jukebox

MLB Ticket Search

D. Submission instructions

Submit a single team project description in courseworks under Class Files' folder "Project Descriptions" by midnight on the date due, February 6, 2012.  Pdf preferrred.