Doctoral Thesis Proposal - Wan Shen Lim April 8, 2025 11:30am — 1:00pm Location: In Person - Gates Hillman 6115 Speaker: WAN SHEN LIM , Ph.D. Student, Computer Science Department, Carnegie Mellon University https://wanshenl.me/official/ Database Gyms: Towards Autonomous Database Tuning Database management systems (DBMSs) are the foundation of modern data-intensive applications. But as more features are developed to support new workloads, they become increasingly complex and difficult to configure. Decades of research on autonomous DBMS configuration have largely produced advisory tools that still rely on human expertise for their deployment into database tuning pipelines. Using these tools involves a multi-step process where a human operator (1) determines an optimization objective, (2) selects a suitable tool to improve the objective, (3) sets up and configures the DBMS to run a particular workload, (4) runs the workload to collect telemetry, (5) uses the collected telemetry to calibrate the tool, and (6) operates the tool to obtain recommendations, which the operator must then review and apply. Because of the ad-hoc nature of these pipelines, they require significant human effort to set up, extend, and deploy. Moreover, these tools are difficult to compose and swap.This proposal presents the database gym, an integrated framework that systematizes and automates the DBMS configuration pipeline. The gym eliminates repetition in the setup and operation of such pipelines by providing a set of reusable, interoperable, and interchangeable components that simplify the development and integration of ML-driven DBMS configuration tools. It leverages its complete control over the tuning process to enable optimizations that require end-to-end knowledge. First, it eliminates step-level repetition by skipping over redundant computation during telemetry collection to reduce the latency of the tuning pipeline. Next, it eliminates pipeline-level repetition by reusing past experience to improve tool performance. For example, it adapts models of DBMS behavior to account for how operator semantics differ across DBMS versions. We propose to extend our preliminary work by developing a tool for DBMS upgrades that uses version-aware models to predict performance improvements and regressions, addressing another database administration task with significant human involvement. Lastly, we will leverage recent advances in agentic artificial intelligence to orchestrate tools on behalf of a human operator. These efforts will transform the database gym from a platform for developing and deploying DBMS configuration tools into an autonomous database administrator for production environments. Thesis CommitteeAndrew Pavlo (Chair)Jignesh PatelDavid AndersenLin Ma (University of Michigan)Additional Information Add event to Google Add event to iCal