Database Algorithms for Modern CPU Memory Hierarchies

Principal Investigator: Kenneth A. Ross

A modern database server typically runs on a cluster of machines with a large amount of RAM, to ensure fast query responses when the commonly accessed data (or even the whole database) can fit in main memory. Data may reside in remote or local memory, or in one of several levels of cache; each type of memory has its own characteristic size and performance properties. The project will develop query processing algorithms and a query processing system tailored to such memory hierarchies. Analytic database systems have emerged as a key technology for extracting actionable information from big data collections ranging from medicine to science to business. Improvements in the performance of core database operations achieved by the project will have impact on many application domains. The project will also contribute to education by contributing technology for use in teaching database system implementation techniques.

The project will develop a database architecture with efficient partitioning as the core operation, using different partitioning techniques at each level of the memory hierarchy. In-place partitioning will be employed where appropriate to avoid allocating extra memory, with only a small decrease in partitioning speed. Partitioning will be used as a building block for other operators such as sorting, joins and aggregation. A variety of database algorithms will be implemented, taking advantage of the efficiency of partitioning to perform well at scale. Network transfer volumes will be reduced using specialized join algorithms, since such transfers are likely to be the bottleneck for queries involving large distributed joins. These algorithms will form the basis of a database system prototype to be developed during the course of the project. The project will provide new techniques to exploit modern machines for efficient analytic query processing. The proposed system will significantly improve the throughput of data-intensive queries.

Participants

Publications

Rethinking SIMD Vectorization for In-Memory Databases. O. Polychroniou, A. Raghavan, K. A. Ross. SIGMOD 2015. link
Efficient Lightweight Compression Alongside Fast Scans. O. Polychroniou, K. A. Ross, Proceedings of the 2015 Workshop on Data Management on New Hardware. link
SIMD-Accelerated Regular Expression Matching. E. Sitaridi, O. Polychroniou, K. A. Ross, Proceedings of the 2016 Workshop on Data Management on New Hardware. link
Deadlock-free joins in DB-mesh, an asynchronous systolic array accelerator. B. Cao, K. A. Ross, S. A. Edwards, M. A. Kim, Proceedings of the 2017 Workshop on Data Management on New Hardware. link
Distributed Joins and Data Placement for Minimal Network Traffic. O. Polychroniou, W. Zhang, K. A. Ross, ACM Transactions on Database Systems 43(3), November 2018. link

Software (supported by this grant)

Code for Regular Expression Matching using SIMD Based on our DaMoN 2016 paper. link
Code for SIMD Vectorization Based on our SIGMOD 2015 paper. link
Code for Efficient Lightweight Compression Based on our DaMoN 2015 paper. link

Related Software

Code for Partitioning and Sorting Based on our SIGMOD 2014 paper. link
Code for Vectorized Bloom Filters Based on our DaMoN 2014 paper. link

This material is based in part upon work supported by the National Science Foundation under grant IIS-1422488.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.