Parallel Data Laboratory Talk

Thursday, July 22, 2021 - 12:00pm to 1:15pm

Location:

Virtual Presentation - ET Remote Access - Zoom

Speaker:

PRASHANTH MENON, Senior Software Engineer https://www.linkedin.com/in/prasmenon/

On Building Robustness into Compilation-Based Main-Memory Database Query Engines

Relational database management systems (DBMS) are the bedrock upon which modern data processing applications are assembled. Critical to ensuring low-latency queries is the efficiency of the DBMSs query processor. Just-in-time (JIT) query compilation is a popular technique to improve analytical query processing performance. However, a compiled query cannot overcome poor choices made by the DBMSs optimizer. Garbage in, garbage out. Poor query plans arise for many reasons and although previous work has explored techniques to compensate for inadequate plans, none work in DBMSs that rely on compiling queries.

In this talk, I will present multiple effective, practical, and complementary techniques to build runtime adaptivity into compilation-based engines with negligible overhead. First, I will propose a method that blends two otherwise disparate query processing approaches (compilation and vectorization) into one engine. Next, I will present a framework that builds upon our previous work to allow the DBMS to modify compiled queries without recompiling the plan or generating code speculatively. This technique enables larger groups of operators in a query to coordinate their optimization process. Finally, I will present a method that decomposes query plans into fragments that can be compiled and executed independently. This not only reduces compilation overhead but enables the DBMS to learn properties about data processed in an earlier phase of the query to hyper-optimize the code it generates for later phases.

Collectively, these techniques enable any compilation-based DBMS to achieve dynamic runtime robustness without succumbing to any of its overheads.

Prashanth Menon is a Senior Software Engineer at Databricks where he is designing the next generation execution infrastructure supporting the large-scale and diverse workloads in the Spark ecosystem. Prior to joining Databricks, Prashanth completed his PhD at CMU in 2021 working with Andy Pavlo and Todd Mowry on databases.

Zoom Participation. See announcement.

Event Website:

https://www.pdl.cmu.edu/talk-series/2021/072221.shtml

For More Information, Contact:

Keywords:

Talks