Database Algorithms for Modern CPU Memory Hierarchies  

Principal Investigator: Kenneth A. Ross

A modern database server typically runs on a cluster of machines with a large amount of RAM, to ensure fast query responses when the commonly accessed data (or even the whole database) can fit in main memory. Data may reside in remote or local memory, or in one of several levels of cache; each type of memory has its own characteristic size and performance properties. The project will develop query processing algorithms and a query processing system tailored to such memory hierarchies. Analytic database systems have emerged as a key technology for extracting actionable information from big data collections ranging from medicine to science to business. Improvements in the performance of core database operations achieved by the project will have impact on many application domains. The project will also contribute to education by contributing technology for use in teaching database system implementation techniques.

The project will develop a database architecture with efficient partitioning as the core operation, using different partitioning techniques at each level of the memory hierarchy. In-place partitioning will be employed where appropriate to avoid allocating extra memory, with only a small decrease in partitioning speed. Partitioning will be used as a building block for other operators such as sorting, joins and aggregation. A variety of database algorithms will be implemented, taking advantage of the efficiency of partitioning to perform well at scale. Network transfer volumes will be reduced using specialized join algorithms, since such transfers are likely to be the bottleneck for queries involving large distributed joins. These algorithms will form the basis of a database system prototype to be developed during the course of the project. The project will provide new techniques to exploit modern machines for efficient analytic query processing. The proposed system will significantly improve the throughput of data-intensive queries.



This material is based in part upon work supported by the National Science Foundation under grant IIS-1422488.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.