ARCHWAYS: Making Remote Multimedia Conversations Persistent and Natural

Sudhir R. Ahuja, J. Robert Ensor, Dorée D. Seligmann
To appear in Proceedings of Technology Summit Telecom '95

Abstract
Archways is a distributed computer program that manages computer- and network-based resources to support multimedia communication sessions within the context of long-term collaborations. The Archways service prototype is built on an underlying services creation and execution environment called MR. MR provides persistent contexts, called meeting rooms, in which services can be created and used. Archways uses the MR rooms to provide its users with a persistent environment for both real-time group interactions and long-term storage of shared information. Real-time interactions are supported by mechanisms for conducting meetings, defining participant meeting roles, coordinating information exchange in multiple media, etc. The persistent environment's long-term memory allows people to schedule meetings, relate one meeting to another, prepare meeting environments, record and replay meeting minutes, etc. This memory can also be used as a message system, allowing people to leave notes and programs for one another in well-known locations. The Archways user interface is based on the concept that conversations among remote parties occur within virtual spaces while the parties remain in their own physical spaces. Therefore, when an Archways user is communicating with another person or program, the system's user interface automatically generates three-dimensional visual and aural displays of the physical environment of the user and relates that to the displays that it generates for virtual communication space. In summary, the interface portrays communication as a natural extension to the familiar, everyday physical environment of the user.

1. INTRODUCTION
Our studies of multimedia communication originated several years ago with our development of a desktop conferencing system, called Rapport.1, 2, 3 That system permits geographically distributed users to interact with each other through a wide array of media, including voice, video, and computer-based text, graphics, and images. More recently, we have extended the concepts of desktop conferencing to develop software support for long-term collaborations. In particular, we have constructed a system, called Archways, that directly supports multimedia communication sessions within the context of long-term group activities. The system has two important characteristics, which are the focus of this paper.

Archways provides a persistent environment that supports both individual and group activities commonly associated with collaborative work. Focusing on private access to shared information as well as real-time exchange of data during multi-party communication sessions, this environment offers a means of seamless transition between private and shared tasks. It provides descriptions of each work session and the context in which these sessions occur. Thus, it provides a means of coordinating tasks and relating one work session to another. The Archways persistent environment also serves as a repository for information, represented in various media, that is used during private and group work sessions.

Like its persistent communication environment, the Archways user interface is designed to support both individual and group activities. The system's interface helps its users access both private information and shared information during private work sessions. It also describes individual and group actions during multi-party communication sessions. The look and feel of the Archways user interface is based on graphical representations of the objects in a person's real-world physical environment as well as explicit graphical representations of communication activities. Because the interface is based on the user's familiar, everyday environment, it is generally perceived as "natural," relating communication sessions to a known physical context.

2. ROOMS--PERSISTENT ENVIRONMENTS
Archways is built upon a services environment called MR (for Meeting Rooms). MR is a distributed computer- and network-based system that provides contexts (called rooms) in which communications take place. Exploiting the fundamental role of context in communications, MR supports a wide variety of services based on communications in one or more media. These services include conventional telephony, real-time multimedia conferencing, distributed game playing, and multimedia electronic mail exchange. Room-based systems provide especially significant support for collaborations. In addition to helping people conduct real-time multimedia, multi-party meetings, these systems help people record meeting minutes, exchange notes, store and retrieve data, schedule activities, etc.

Figure 1.

As Figure 1 illustrates, MR is built as a specialized client-server system. Each user is represented by a conversation manager, which is a client of an MR server. An MR server manages two sets of resources--rooms and meetings. A room is a persistent context in which a user may access one or more services. With these services, users may interact in group activities or may access information individually. Thus, a room serves as an electronic place of rendezvous--a meeting place--and an electronic environment in which to create and offer services--a place for information storage and exchange. A meeting (also called a session) is a period of user activity within a room. MR coordinates the use of services by controlling user access to each room and its associated services and communication sessions. MR also controls which services may be associated with a room through its registration and resolution of service names. The system can also impose security and billing functions on user requests. The server provides basic abstractions for service control protocols which encourages development of common environments for service access. In fact, Archways implements one such environment for a fixed set of services.

In the MR architecture, each service accessed through an MR server is built according to a client/server structure. Figure 1 shows two services, service A, which is implemented by server A and its clients--the local managers A--and service B, which is implemented by server B and its clients--the local managers B. Each client represents a system user and is called a local manager because it manages the local resources used in the service for its associated user. As a particular example, service A could be an on-line data access service and service B could be a home banking service. The local on-line data access manager of user 1 would manage the local resources needed to provide this service to user 1. Similarly, the home banking manager of user 1 would manage the local resources needed to provide this service to user 1. User 2 would also have local managers for these services. Activities of the local managers of a user are coordinated by the user's conversation manager (labeled CM in the figure). For example, if both the on-line data access service and the home banking service needed to exchange information with remote servers, the conversation manager would help them coordinate their use of common transmission resources. Similarly, the activities of a set of servers are coordinated by an MR server. Therefore, a user's conversation manager and an MR server act as the heads of dual hierarchies of managers and servers. Additional structure is possible within the MR architecture. For example, a server may be built by composition of other servers. Similarly, a local manager may have local clients (such as a user interface--labeled UI in the figure), and may also act as a client of local servers. A server and each of its associated managers share a generic protocol, which defines service association with rooms. Every component of a service also shares a service-specific protocol, which defines how the service is used within a room.

The MR architectural components described in the preceding paragraphs are built upon transport services. The underlying transport is used to exchange the control messages that coordinate the activities of conversation managers, MR servers, and the servers and clients (local managers) that have been associated with rooms. It is also used to exchange service-specific information among servers and their clients. The MR architecture separates transport channel control from service access and from session management. Rooms provide a rendezvous mechanism for accessing services and conducting communication sessions. Therefore, a user can move physical locations, change hardware configurations, and still access a given room. Connections are simply dropped when not needed and (re)established to hold meetings within a room. A user can access a room from any point as long as he or she is represented by a local conversation manager that can communicate with an MR server. Similarly, each service accessed within a room is represented by local managers that communicate with their corresponding servers. For example, a user can create a virtual meeting room by issuing requests from her office computer. She can begin to execute a program within the room. She may then leave the virtual room (although it continues to exist) and her physical office and drive to her house. During the drive, she may re-enter the room through her car telephone, talking with other parties in the meeting and accessing information within the room through audio interfaces. At her house, she can use a different device, e.g., a set-top box, to reestablish contact with the MR server and re-enter the virtual room where the still-executing program is located.

3. USER INTERFACE
Building the user interfaces for Archways prototypes has given us the opportunity to explore new ways to represent multimedia communications. The latest result of our studies is an interface based upon the following model of long distance communications. When a person interacts with a remote entity, the person remains in his or her physical location but is, simultaneously, in a virtual environment defined by the interaction. For example, when people are talking with each other via a telephone call, each of them is in a real world physical location as well as in the virtual place which hosts the telephone conversation. Similarly, when several people play a multi-player game on the Internet, each player is in a physical place and in the virtual place defined by the operations and contents of the game playing service. These dual environments are not created only for communication over distance among people; these virtual environments also arise when a person or program accesses a remote service. For example, when a CD player transmits an audio stream, it is in some place within the real world as well as in a virtual place with the people and/or programs receiving its output.

When an Archways user is communicating with one or more remote parties, Archways generates a representation of the the real world location of each communicating party and a representation of the virtual place hosting the communication. Furthermore, Archways can generate customized visualizations of each of these environments for each party. The visualizations of an environment illustrate the people and things within the location, and conceptual relationships among these entities. The Archways visualizations themselves are multimedia displays, consisting of text, 2D and 3D graphics, images, monophonic and 3D sound, and so on.

Archways uses knowledge-based graphics techniques similar to those in IBIS 7,9 to generate automatically the customized views for each user. The view generation is based on 1) knowledge about presenting conference state information, 2) knowledge about generating multimedia representations and cues for media-specific events,5,6 3) knowledge about the topography of real world objects, especially the relevant communication devices, and 4) knowledge about the placement and dependencies among the various elements in the visual representations. A user may be presented with multiple views of a communication environment, each view designed to fulfill a specific purpose. For example, views may be generated to 1) show the contents of a virtual place, 2) illustrate what devices are accessible to a user and show their current state, 3) represent a user's location in the real world and in a virtual communication environment, and 4) indicate the status of information exchange operations. Every object in a visualization is designed to convey one or more concepts. Each view is constrained by its original communicative intent8 although users can manipulate the camera parameters, access interaction objects, or trigger object-specific behavior.

The Archways visualization of real and virtual places is based on representations of the places themselves, as well as the people, objects, and services in these places. The visualizations also include indications of the relationships and associations among these objects. The visualization of a real-world location is, of course, incomplete. Using a description of a physical location, Archways creates a view with a corresponding floor plan and representation of the devices (such as telephones, computer monitors, and keyboards) used in multimedia communication. Support objects are created for devices and people. For example, desks serve two purposes: in virtual places they represent shared surfaces,10 in real places they are included to avoid displays in which objects float in midair.4 The representation of a person is based upon a full-body photograph. The photo is manipulated for color-coding and ghosting, which are the techniques used to show a person's role in a location and a person's presence in multiple locations. Archways provides a rich set of representations for the services present in an environment. All devices that are associated with a specific service (e.g., a monitor used to display movies seen through a video on demand service) are portrayed as providing input or output for that service. The outputs of services (e.g., shared documents, application program displays, video streams, and audio streams) are displayed as media objects. Virtual "devices" are used to represent interaction mechanisms such as pointers and messages. Relationships among these objects are also illustrated. "Real connection" objects represent real wires while "virtual connection" objects represent data flows.

In addition to generating visualizations of these environments, the Archways user interface coordinates the presentation methods of the services associated with the environment. Thus, the individual data streams, visuals, and audio coalesce into cohesive and consistent representations of the environments. Furthermore, Archways is not tied to any particular configuration of services, rather it is designed to support dynamic sets of services as they are associated with a communication environment. Because the Archways user interface manages the 3D interaction objects to control the displays generated by other media services, it is much more than the user-interface to a particular multimedia application.

EXAMPLES

Plate 1.

This section discusses several examples of visualizations generated by the Archways user interface. In the first example, which is illustrated in Plate 1, John and Dorée are in a conversation together. Archways has generated a visualization of their situation, showing a square soap-bubble virtual room hovering over representations of their offices. Their photo-based images appear in the virtual room and in their offices. Archways highlights their presence in the virtual meeting room by ghosting their images in their offices. The devices that they are using to communicate, their phones and computers, are represented in the virtual room as well as their offices. Cables are shown connecting these devices to indicate the flow of information between their private spaces and the shared space.

Plate 2.

In our second example, four people--Dorée, John, Cati, and Sid--are browsing (i.e., "surfing") the Internet together by sharing a Mosaic browser during a multimedia communication session. The Archways user interface creates a unique representation of this interaction for each "surfer." The virtual room representation created for Sid is shown in Plate 2. At the moment illustrated by this photo, all four people are looking at Cati's home page which features Cati's fake stamp art collection.

Figure 2.

Figure 2 outlines the architecture of the Archways prototype used to generate the examples under discussion. The MR server and the conversation managers at each local site provide the basis of the Archways prototype. Two other services--N-ICE (Networked-Interactive Collaborative Environment), which is an application sharing service, and the holophonic service, which is a 3D sound imaging service--have been used in the prototype supporting the example scenario. The N-ICE server logs application program events and N-ICE specific events. The N-ICE manager at each local site connects to a local X Windows server and local X-Window applications. The holophonic server uses a CRE Convolovotron to create a 3D sound space. Special effects (such as the clicks of keystrokes) are produced by a set of SoundF/X servers. Archways sends location information to the holophonic server to position audio objects in the environment, and requests time-based sound effects to be executed. Prior to producing the image shown in Plate 2, Archways interacted with the other services in the MR environment. Archways received notifications from the MR server when the room was created, the media services were associated with it, and when the various people joined the meeting. Archways received notifications from N-ICE when the Mosaic browser was first registered with the room and then executed, and continued to receive events relating to each person's interaction with the browser. The Mosaic browser is an arbitrary X-Windows application program executing on a native X-Windows server. N-ICE manages windowed displays so that they appear in shared workspaces. N-ICE represents application programs, their state, the windows associated with them, and processes all input and output events related to each, all within the context of virtual rooms. Archways was notified of these events, which enabled it to generate live copies of the program's windows and place them in the scene.

As each notification was received, Archways generated and placed the objects to represent this virtual room. The room itself is represented as a square soap-bubble hovering over the representations of the communicating parties' real world locations. The people are represented by color-coded, texture-mapped images. The round table indicates that the room is for conferencing. The image of the shared Mosaic browser appears on the table as does each person's devices (animated keyboards and mice), as shown in Plate 3.

Plate 3.

Plate 4 shows a composite view is the visualization of the virtual room automatically generated for Sid's perspective. It consists of two displays. The top view is designed to show the other conferees. The bottom view is designed to show the shared objects. The viewing parameter is sensitive to the contents of the virtual room as people come and go, the view changes to include them all. Furthermore, if the user changes the viewing parameters; certain objects will move to satisfy viewing constraints. For example, the image of the Mosaic browser will rotate so that the window is always facing the user.

Plate 4.

Plate 2 was produced when Archways augmented the Plate 4's views with visual cues that are designed to provide information about each participant, their activity and focus. For example, the stamp floating in front of Cati indicates that she has focused her attention on a spawned viewer of one of her stamps. Archways generated the red line connecting Dorée's keyboard to the browser display, indicated that she currently has input control of the browser. John is using a telepointer. This is indicated by the blue pointer from his mouse to the point of his attention, a stamp on the shared browser.

This example illustrates several salient features of the system. First, the virtual place was created and as people entered it, the round table grew in size as more people arrived. Archways assigned a color to each participant and placed each around the table. The spatial locations were sent to the holophonic service that convolves the monophonic audio from each person's microphone so that each participant is presented with a stereo pair from his or her perspective in the virtual place. During the course of this session, Archways generated several multimedia cues. When Dorée entered input via her keyboard, the keys on the graphical keyboard associated with her moved and sampled key-click sounds were sent to the holophonic service where they were added to the 3D sound space. When John used his mouse as a pointer, a color-coded arrow was displayed and tapping sounds were added to the 3D sound space.

4. CONCLUSIONS
The Archways system directly supports multimedia communication sessions within the context of long-term group activities. The system has two noteworthy characteristics: It provides a persistent environment that supports both individual and group activities, and it has a user interface which relates communication activities to a user's familiar physical environment.

Archways is a communications service built upon the MR services environment. It uses the room environment of MR to provide a persistent environment for its users. This environment supports real-time exchange of data during multi-party communication sessions, as well as individual access to shared information. It provides descriptions of each work session and the context in which these sessions occur. Thus, it provides a means of coordinating tasks and relating one work session to another.

The Archways user interface also supports both individual and group activities. It is based on graphical representations of the objects in a person's real-world physical environment as well as explicit graphical representations of communication activities. Because the interface is based on the user's familiar, everyday environment, it is generally perceived as "natural," relating communication sessions to a known physical context

REFERENCES
1) Ahuja, S.R., Ensor, J.R., and Horn, D.N., "The Rapport Multimedia Conferencing System," Proc. Office Information Systems, Palo Alto, CA, March 1988, pp. 1-8.

2) Ensor, J. R., Ahuja, S. R, Connaghan, R.B., Pack, M., and Seligmann, D. D. "The Rapport Multimedia Communication System." Proceedings of ACM SIGCHI '92 Human Factors in Computing Systems, Monterey, California, May 3-7, 1992.

3) Ensor, J. R., Ahuja, S. R, and Seligmann, D. D. "User Interfaces for Multimedia Multiparty Communications." Proceedings of IEEE International Conference on Communications ICC '93, Geneva, Switzerland, May 23-26, 1993.

4) Feiner, S. K. "APEX: An Experiment in the Automated Creation of Pictoral Explanations." IEEE Comp. Graphics and Applic. 5:11, 1985. pp. 29-37.

5) Seligmann, D. D., Mercuri, R. T., and Edmark, J. T. "Providing Assurances in a Multimedia Interactive Environment." Proc. of ACM SIGCHI '95, Denver, Colorado, May 7-11, 1995. pp. 250-256.

6) Seligmann, D. D., and Edmark, J. "User Interface Mechanisms for Assurances During Multimedia Multiparty Communication." 1st International Workshop on Networked Reality in Telecommunication, Tokyo, Japan, May, 1994.

7) Seligmann, D.D. "Interactive Intent-Based Illustrations: A Visual Language for 3D Worlds." PhD Thesis, Dept. of Computer Science, Columbia, University. 1993.

8) Seligmann, D. D., and Feiner, S. "Supporting Interactivity in Automated 3D Illustrations" Proc. of the 1993 International Workshop on Intelligent Interfaces (IWII '93). Orlando, Florida, January 4-7, 1993. pp. 37-43.

9) Seligmann, D.D. and Feiner, S. "Automated Generation of Intent-Based 3D Illustrations." In Proc. ACM SIGGRAPH '91 (Computer Graphics, 25(4), July 1991). Las Vegas, Nevada, July 28-August 2, 1991. pp. 123-132.

10) Sohlenkamp, M., Chwelos, G. "Integrating Communication, Cooperation, and Awareness: The DIVA Virtual Office Environment." Proc. of ACM CSCW '94. Chapel Hill, NC, October 22-26, 1994. p. 331-343.

Biographies
Sudhir R. Ahuja received his Ph.D. in Electrical Engineering from Rice University. J. Robert Ensor received his Ph.D. in Computer Science from the State University of New York at Stony Brook. Dorée D. Seligmann received her Ph.D. in Computer Science from Columbia University. The authors are members of the Multimedia Communication Research Department at AT&T Bell Laboratories, which is headed by Dr. Ahuja. They are continuing their collaboration with further development of Archways and with creation of continuous media services.