For my dissertation, I have implemented the Bernoulli compiler which automatically generates sparse matrix codes, given the corresponding dense codes and a specification of the data formats. The main problems are the restructuring of computations to produce efficient simultaneous enumeration over multiple data structures, and the development of a proper interface for the specification of sparse matrix formats.
The key insight to solving these problems is that we can view sparse and dense arrays as relations, and the execution of nested loops as the evaluation of certain queries over these relations. Storage formats are specified to the compiler through search and enumeration access methods. Code restructuring is then formulated as a search for the most efficient plan for the query. This software architecture not only allows for a clean design of the compiler, but it also exposes additional opportunities for code optimization and has led us to more general transformation algorithms than previously reported in the literature.
Experiments on the IBM SP-2 show that the performance of the code generated by the Bernoulli compiler for common numerical kernels is within 2-4% of the performance of hand-written libraries. The compiler is being used in a Grand Challenge project at Cornell on fracture mechanics simulation.