Prof. Salvatore J. Stolfo

Department of Computer Science, Columbia University, New York, NY 10027
sal@cs.columbia.edu

Member of the Polytechnic University Center for Advanced Technology

Active/Deductive Databases System Reorganization and "Predictive" Load Balancing

In this work we developed new Predictive Dynamic Load Balancing algorithms, as well as new Balanced Parallel Join algorithms operating over heterogeneous processing sites. This work is the subject matter of the PARADISER project, supported by an NSF grant through the CISE Database and Expert System Program. PARADISER is an environment providing a parallel rule language, called PARULEL built on top of the Sybase DBMS for parallel and distributed evaluation of rule programs for decision support and Active Database tasks.

The central idea we studied, implemented, analyzed and empirically evaluated is to model the run-time behavior of the inference system so that it would dynamically reorganize its own distributed computation when "imbalance" was detected at prescribed intervals in the evaluation process. This was accomplished by computing estimates of run-time computational costs using histograms that provide estimates of the distribution of data residing in the database. These estimates were able to evaluate the relative cost of executing a rule program over different partitions of the database in a manner similar to techniques that have been employed in relational database query optimizers for choosing the best query plan. The system utilized "triggers" in the underlying database management system to update and maintain the run-time histograms as inference computations proceeded at various sites. The system also models the relative computational performance among the participating sites by loading weights in order to assign future estimated workloads in a balanced manner to reduce large variances in completion times. The empirical evaluation of the system used a number of real-world applications that were coded in rule form, rather than toy problems with just a few exemplar rules. The results conclusively demonstrated that predictive load balancing increased utilization and overall speed-up and with low overhead.
BACK