Controlling Devices By Phone

Samit Jain
Columbia University
New York


Columbia InterNet Extensible Multimedia Architecture (CINEMA) provides a test bed for different multimedia and Internet telephony applications. We have implemented this project to provide CINEMA with a module that enables controlling devices from a phone. Users can call up and control remote devices such as lights and appliances using a SIP or a regular phone. The system uses various technologies including text-to-speech, VoiceXML, and SIP.

Users can get the current status of a device and can also subscribe to events. For example, a user might like to be notified when the doorbell in his house rings and he is not at home. The event notifications can be delivered to user’s phone, or other contact addresses. In the current implementation, the notification can only be sent to user's phone. The system calls the user and plays out the notification. This can easily be extended to allow notifications to be sent to other SIP user agents, such as, SIPC.


The Session Initiation Protocol (SIP) is a signaling protocol used for establishing and terminating Internet telephony call. VoiceXML is a language designed to create audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input and recording of audio for telephony applications. Sipvxml is a SIP based VoiceXML browser. Users can connect to the browser using SIP, and take part in the application defined interactive voice response systems. It also brings the advantage of VoiceXML technology to a telephone user by using a SIP-PSTN gateway.

The Columbia SIP User Agent (sipc) can be used for controlling network appliances via a device gateway such as SIP-X10 device gateway. But it lacks a good user interface to display status and information about a device. Secondly, all information about devices and device control commands is stored locally.

We integrate the two technologies in this project, that is, Sipvxml and existing device control mechanism, to allow a user to control devices from a regular phone. There is potential for wide usage of this technology as the user can control a device from anywhere using his phone; which is a big advantage over controlling devices from sipc. Secondly in this system, all information about users, devices, device types, control commands, device status information, and locations for a specific network domain is stored in a centralized database. This allows us to authenticate and authorize users for specific devices they are allowed to control, which is useful since security is an important concern for a remote device control system. We also monitor all remote opertions on a device and maintain status information for the device, which can be reported to the user. We have also implemented events notification system for device control. In such a system, a user can submit a subscription for specific event(s) on a device. If that event occurs, in other words, another user executes that command on the device, the user will be notified at the his primary contact in database.

The report is organized as follows:
1) Introduction
2) Related Work 
3) Architecture 
4) Program Documentation 
5) Future work 
6) References

Related Work

Some projects have been done in IRT lab in the past related to VoiceXML and some related to device control. But this project is different as it deals with device control using phone and uses a combination of existing SIP technologies, specifically VoiceXML and device control using SIP DO message.

Yahoo Calendar provides a good reference for a simple event notification system.


The basic architecture of the system is shown below:

The main implementation of the system takes place in the web server component although there were some modifications required in other modules also, such as,
in implementation of voicexml for initiating a call with a user for notification purposes, and sipd for proper handling of SIP DO request. The web server
component has the following modules:

Common Gateway Interface (CGI) scripts: The VoiceXML interpreter sends HTTP requests to the web server. The web server receives the request and
runs the
corresponding CGI scripts. The web server is also responsible for communication with the device gateway and sending and receiving SIP DO requests and
responses. It monitors device control operation requested by different users and based on success/failure, maintains status information about devices, and other
related information. It sends responses back to the browser in the form of VoiceXML pages which contain subsequent dialogs between the system and the user.
Lastly it also handles event notifications. Users can submit subscriptions to the system. On successful execution of a command, it starts a new process that checks
subscriptions in database and notifies users. The notification is handled by sending a SIP refer message to the Voicexml interpreter along with the notification
dialog. On receiving this message, the interpreter initiates a new call with the user (to be notified) in a separate session, and notifies the user. The server also ensures
concurrency control between different users operating on the same device.

Database: The database is centralized for all users of the system. It contains information about users, devices, commands, events and subscriptions. We have used
MySQL database system.

SIP library: The library forms the SIP user agent interface for the web server and allows communication with the SIP-X10 device gateway directly or via proxy.

We have used SIP library in Tcl since the server scripts are written in Tcl.

Lets consider an example use case in the system :-

Now lets look at the control flow for this use case at the implementation level :-
  1. On receving the initial call, the web server simply returns a welcome page to the browser, which will prompt for user PIN.

  2. On receiving the user PIN, the system first authenticates the user from the database. Once authenticated, the system checks the devices the user is authorized to control. It returns the device menu (containing all authorized devices for user) to the browser which prompts the user to select a device.

  3. On getting a device selection from user, the server reports the current status of the device to the user, obtained from database. It then checks the commands that are defined for that device (based on device type), and sends a command menu to browser, which will prompt the user to select a command.

  4. On getting a command selection, the server checks if the command requires any extra arguments. In that case it will prompt the user for each argument.

  5. At this point, the system has all the information it needs to execute a command on the device. The first thing it does before processing command is that it sets a lock attribute in the database, which ensures other users cannot operate on the same device at this time and will have to wait. If another process has a lock on this device and the timestamp (when the lock was acquired) is within acceptable limit (5 s), then the current process waits.

  6. It builds a SIP DO request message with the appropriate headers and content. It starts the sipstack and passes the SIP DO message to the stack. The stack after processing and formatting the request sends the SIP request to the SIP-X10 device gateway directly or via a proxy. The gateway on receiving the SIP request sends a 100 provisional response back to the stack. The server informs the client with a message "Hold on. Your request is being processed".

  7. The gateway then sends a corresponding X10 message to the device via X10 controller. If the operation succeeds, it is reported back to server (200) or failure is reported (400).

  8. The server on getting the final response starts its updates. If the command was successful, the server first updates the device status.

  9. If command was successful, the server then starts a new process to check event notifications and inform clients.

  10. This new process is provided {user_id, device_id, command_id} which indicates the user who successfully executed the command on device. The process checks device subscriptions. For each user who needs to be notified, the process sends a SIP REFER to Sipvxml interpreter. On receiving the REFER message, the interpreter initiates a new call with the user in a separate vxml session. On establishing a SIP session with user, the interpreter plays the notification message to the user and then forwards him to the device control center welcome page.

  11. In the meantime, the web server after spanning a new process (to handle notifications) performs other necessary updates in database. Spanning a new process avoids delay overhead for current user. The server then releases the lock attribute in database and informs the user that the command is successfully executed. It then forwards the user back to main menu page.
Approximate Lines of Code: 3000
Language Used: Tcl/Tk

Program Documentation

Future Work


[1] SIP VoiceXML Browser  
[2] SIPC Documentation
[3] Tcl / Tk
[4] VoiceXML Spec
[5] Yahoo Event Notification System