The Distributed Virtual Communication Machine
Wednesday, May 6, 1998
11am-12:15n
Interschool Lab, 7th floor, Schapiro CEPSR Bldg.
Host: Gail Kaiser
Abstract
The Distributed Virtual Communication Machine (DVCM) is a communication
architecture for tightly coupled clusters of workstations connected by
high-speed networks. The architecture addresses one of the most important
problems of cluster-based parallel computing: the inability of scaling
the performance of communication software along with CPU performance. The
DVCM approaches this problem by providing a set of very-low overhead data
transfer primitives, and mechanisms for combining and extending these primitives
into application-specific communication operations. The DVCM is the extension
of our previous work on the Virtual Communication Machine (VCM). Both architectures
require coprocessor-equipped network interfaces and their implementations
consist mostly of firmware running on the network coprocessor.
The VCM is a programmable abstraction of the local network interface.
Most VCM commands implement user-level, zero-copy messaging primitives.
The VCM command set is extensible, enabling the transfer of communication-related
processing from the application and/or host kernel to the VCM implementation.
The DVCM extends the VCM architecture by providing a single programmable
abstraction of the cluster network. The DVCM acts like an ``active backplane'',
implemented in firmware, for the components of a parallel or high-performance
distributed application. In addition to the VCM's messaging primitives,
the DVCM includes mechanisms that enable the efficient implementation of
application-specific cluster-wide operations as DVCM extension modules.
The talk presents the VCM and DVCM designs, their current implementations,
and our experiences with using them.
Luis Gravano
gravano@cs.columbia.edu