Phone2Dev: Phone Device Control System

Stanislav Miroshnikov
Department of Computer Science
Columbia University
New York, NY 10027


Home automation is no longer in the realm of science fiction or magazines about the lives of rich and famous. Now companies like X10 manufacture devices and controllers for devices that can be connected to a home network. There is an obvious need for those devices to be controlled remotely over the phone or the web. The same companies offer gateways for accessing the home network, but most of the time they are expensive, vendor specific and require complex installations. However, with emergence of mature technologies like Voice XML,  PHP and Session Initiation Protocol(SIP) we can now create relatively inexpensive, not vendor specific and easy to setup device control systems. In this project we implement such a system.


Related Work

A Voice XML gateway and a web server setup as a service for users to control devices is much cheaper and easier to maintain than a stand alone device control appliance. As far as we know, nobody have implemented such a system to date.


Device control, especially in home automation is becoming increasingly popular. The exists a plethora of devices that can be controlled directly(networked appliances) or indirectly(via appliance controllers):

These devices can be controlled through different modes of control:

But there is a lack of standardization. There exists a multitude of ways that devices can be connected to each other and their controllers i.e. custom installed wiring, power line networks, phone line networks etc. and they can use a variety of different protocols to communicate i.e. UPnP, Havi, X.10 etc.

Session Initiation Protocol (SIP)[4] is a mature, open source protocol developed by IETF that is natural choice for communicating with all these difference devices for several reasons. First it supports a variety of communication paradigms that are important in a device control setting:

It can easily communicate messages to devices that do not have SIP user agents(UA) through SIP to different device control protocol gateways. It is also easy to address devices on a home network through a hierarchical naming scheme and send control message Device Messaging Protocol (DMP) MIME type[3]. The caveat of using SIP is that we can re-use of the infrastructure that has been constructed for to control devices. This means that important concerns like NAT Traversal are already addressed. For further detail refer to IETF draft "Framework Draft for Networked Appliances using the Session Initiation Protocol "[2], which originally proposed the use of SIP to control networked appliances.


The overall design of the system follows the architecture design suggested in the IETF draft. It assumes that there exists a set of network appliances which can be controlled by the SIP proxy that is located on the same network. This SIP proxy can be accessed from the Internet trough a firewall/NAT using standard methods.

However, implementation details of the SIP proxy are outside the scope of this project. The focus of this project is on the other side of the firewall/NAT. It is a system for controlling devices that uses another two major technologies, Voice XML and PHP, together with SIP to control devices over regular or Voice over IP (VoIP) telephones. We envision this system being deployed by a device control system provider, which would offer considerable savings and improvements over phone controls systems today that are stand alone appliances that have to be placed on the user's network. Essentially, all the provider would have to run is a PHP enabled web server that would support web and voice interfaces to control the devices. This would offer significant benefits in terms of cost, because it is not operated for a single user, in terms of functionality and reliability, because it would be upgraded and maintained by professional staff of the provider. Moreover, the provider could offer access to its voice interface through other Voice XML gateway providers. This offers another significant benefit to the customer because it would allow them to access commercial grade Voice XML, that are much more powerful than the ones in stand alone appliances, through regular and VoIP telephones.

In this project we have have designed the architecture for such system. It is organized into the following three parts: back end, web interface and voice interface.


Backend is implemented as a class that completely abstracts the database access to the web and voice interfaces. The database was design to be allow user to configure devices that they want to control together with with the menu layout and commands that they want to send to those devices.

Web Interface

Web interface gives access to following functionality using the backend class:

The system was implemented such that the users would have to first register themselves in the database using the registration module and then login using the login module. Once they are logged in they can add to devices, menus and commands from the main menu, which would direct the user to the modules supporting the requested actions.

Voice Interface

The voice interface the user can access only a subset of functionalities due to difficulties inherent in the accuracy of speech recognition engines and limitation of voice input i.e. it would really hard to specify an xml schema for turning on a light switch using voice commands. The supported functionality is:

The voice interface supports voice and DTMF inputs to accommodate users who are located in the noisy environments or who are having problems with being recognized by the voice recognition engine. Users can login through the voice login module and navigate the same menus and send commands as in the web interface.


The separation of the system into backend, web and voice interfaces was motivated by the benefits of separation of application logic and the display logic. It allows one to be updated without touching the other, promoting code reusability and better isolation from changes. The same database access routines can be used to generate web and voice content, without having to replicate the code. This system can also very easily support templating to improve consistency of appearance and allow the maintainers of the system to change the site's design without having to modify the back end code and vice versa. The system was implemented in PHP 4.2+ in approximately 1500 lines of code.

Program Documentation

Task List

The following tasks were accomplished in the course of the project.

  1. Performed background research
  2. Designed the system
  3. Designed the database the will contain the user, device, device capability and permission data
  4. Setup a My SQL server and implemented design in a SQL schema
  5. Studied Voice XML and PHP Voice library for generating Voice XML application.
  6. Studied PHP and libraries for database access and form creation and validation.
  7. Created scripts to generate Voice XML login, menus and commands.
  8. Created scripts to generate web pagers for login, registration, menus, commands, logout.
  9. Created script for sending SIP control messages to devices.
  10. Tested the system using a BeVocal Cafe Voice XML gateway.


Mark Miller, Voice XML: 10 Projects to Voice Enable Your Web Site, Wiley Publishing, Inc., New York, NY, 2002.
S. Moyer, D. Marples, S. Tsang, J. Katz, P. Gurung , T. Cheng, A. Dutta, H. Schulzrinne, Arjun Roychowdhury, Framework Draft for Networked Appliances using the Session Initiation Protocol , draft-moyer-sip-appliances-framework-01.txt.
Sumit Khurana, Ashutosh Dutta, Device Message Protocol (DMP): An XML based format for Wide Area Communication with Networked Appliances, draft-khurana-dmp-appliances-00.txt, 2000.
M. Handley, H. Schulzrinne, J. Rosenberg, SIP: Session Initiation Protocol, IETF RFC 2543, 1999.