AudioMail System: E-mail by Phone

Serge Shamis
Columbia University
shamis@ms.com
212-762-2139

 

Contents

Abstract

Introduction

Related Work

Architecture

Program Documentation

Task List

Possible Enhancements

References

Abstract

In the modern world, there is a growing need for integration of the Internet services with the "old", accepted technologies of everyday life. The AudioMail system makes one step in that direction by trying to integrate the regular phone voice-mail and Internet e-mail. AudioMail is a pluggable component that provides an interface for reading a user's e-mail over the phone. Just like a regular voice-mail service, its user input consists of dual-tone multifrequency (DTMF) signals, while its output is e-mail messages read aloud.

Introduction

Let us take a real-life example. Suppose you are going on vacation to Spain for two weeks. Even though vacations are not meant for this, you would like to check your messages at work periodically, just to see if there is anything urgent to which you might have to respond. Of course, your company has a free dial-in number in Spain which you can call to check your voice mail. But this is the 90's, the Internet Age. Most messages you now receive are e-mail, not voice. Well, you can take a laptop with you, since your company also has a dial-in modem pool where you can connect to check e-mail, but that means that you have to drag along a laptop for the sole purpose of reading your messages. What do you do?

One solution is a voice e-mail system. The idea is that you can check your e-mail over the phone, much like you would check your voice mail. The e-mail messages are simply read to you aloud. The AudioMail system provides the core software engine needed for a voice e-mail system. The user input in AudioMail consists of dual-tone multifrequency (DTMF) signals, which are the sounds generated by buttons of touch-tone phones. The output of the system is a run-time generated voiced rendition of e-mail messages. The AudioMail system can be used in combination with a telephone gateway to provide a complete voice e-mail service. Another application of AudioMail is as an alternative e-mail interface for blind and vision-impaired users. A simple pocket tone dialer that generates DTMF signals can be used as the input device in this case.

Note: Due to the current unavailability of an analog phone line interface (it is currently in use by another group), AudioMail does not attempt integration with an actual telephone gateway, but that can be easily achieved once the line becomes available.

Related Work

Similar projects are concurrently being done by several other people at Columbia University. Two other teams -- one including Jack Hsu and Jeff Stutz and the other including Jeremy Blumenfeld and Miriam Tauil -- are also working on "e-mail by phone" systems. Francesco Caruso and Xin Jin are working on related projects which provide Web pages by phone.

Architecture

The AudioMail system is made up of several components:

This is a DTMF recognition component, based on Oertel DTMF detection functions, which are part of their Call Server software.

This class is responsible for parsing the user's e-mail and creating an internal data structure to contain relevant headers and bodies of all of the e-mail messages. In the current implementation, it assumes there is a mail spool file for the user, as identified by the environment variable MAIL. Providing that file if one does not already exist would be external to the program. This could be achieved using some available software, such as movemail or fetchmail utilities, and would provide more flexibility for the system.

This class contains the "main loop" of the program. It integrates all the pieces together and provides the user interface. The AudioMail system uses the TTS (Text-To-Speech) software from Bell Labs for reading the mail messages aloud -- that is, for generating audio output from a text buffer. TTS is server-based: a TTS daemon, which has to run on the same or a different machine, is responsible for the text-to-speech conversion. The AudioMail class uses the TTSC (TTS Client) Library API to communicates with the TTS daemon.

The main program is very simple -- it simply creates an object of class AudioMail, which is then responsible for all the rest.

All source code is written in C++. The system contains about 1100 lines of code.

Program Documentation

The source code for the software can be found at the following location: http://www.cs.columbia.edu/~serge/E6998-03/src/.

A C++ compiler with the Standard C++ Library and Standard Template Library (STL). The system has been tested with g++ compiler, but it should work with other C++ compilers as well.

A Makefile is provided with the source code files. Simply run make to build the executable.

Solaris (SunOS 5.5.1) workstation with audio input/output capabilities (a microphone and a speaker).

Before running AudioMail, the TTS daemon has to be started by running startTTS script. A TTS daemon can be running on the same machine as AudioMail or on a different machine. If AudioMail is using a TTS daemon running on a different machine, it has to set the environment variable TTS_SERVER in order to connect to that daemon.

The executable file that is produced is called audioMail. It takes no command-line parameters.

Both input and output are audio-based. Any DTMF signal device, such as a telephone keypad or a tone dialer, can be used for input. The user is presented with options to read a message, go to the next message, go to the previous message, read the header of the current message, cancel reading, read all headers, etc. "Online help", in the form of audio instructions (the list of available options) can be accessed at any point in time.

    1. The current implementation handles only simple text mail messages. Multipart messages, for example, are not handled, even if the attachments themselves contain only text.
    2. The TTS daemon apparently has a limitation on the size of its text input buffer. If the buffer is too large, it complains about "running out of words" and does not process the text.
    3. Many e-mails contain text indented with '>' characters, indicating that it is someone's original message. TTS reads these characters as "greater than" signs, which can be quite irritating to the user. Some customization would be helpful -- so that the indented text can be declared as the original message.

Task List

All work on this project has been done by Serge Shamis (shamis@ms.com).

Possible Enhancements

The AudioMail system as presented here is only a simple prototype. Possible future enhancements and extensions might include:

References

Bell Labs Text-To-Speech System

Columbia CS Department's telephone gateway

DTMF Background