The Distributed Virtual Communication Machine

Marcel-Catalin Rosu
College of Computing
Georgia Tech

Wednesday, May 6, 1998 
11am-12:15n
 Interschool Lab, 7th floor, Schapiro CEPSR Bldg.

Host: Gail Kaiser

Abstract

The Distributed Virtual Communication Machine (DVCM) is a communication architecture for tightly coupled clusters of workstations connected by high-speed networks. The architecture addresses one of the most important problems of cluster-based parallel computing: the inability of scaling the performance of communication software along with CPU performance. The DVCM approaches this problem by providing a set of very-low overhead data transfer primitives, and mechanisms for combining and extending these primitives into application-specific communication operations. The DVCM is the extension of our previous work on the Virtual Communication Machine (VCM). Both architectures require coprocessor-equipped network interfaces and their implementations consist mostly of firmware running on the network coprocessor.

The VCM is a programmable abstraction of the local network interface. Most VCM commands implement user-level, zero-copy messaging primitives. The VCM command set is extensible, enabling the transfer of communication-related processing from the application and/or host kernel to the VCM implementation. The DVCM extends the VCM architecture by providing a single programmable abstraction of the cluster network. The DVCM acts like an ``active backplane'', implemented in firmware, for the components of a parallel or high-performance distributed application. In addition to the VCM's messaging primitives, the DVCM includes mechanisms that enable the efficient implementation of application-specific cluster-wide operations as DVCM extension modules.

The talk presents the VCM and DVCM designs, their current implementations, and our experiences with using them.



Luis Gravano
gravano@cs.columbia.edu