Daniel Lin-Kit Wong Machine Learning for Flash Caching in Bulk Storage Systems Degree Type: CS Advisor(s): Gregory R. Ganger Graduated: September 2024 Abstract: Flash caches are used to reduce peak backend load for throughput-constrained data center services, reducing the total number of backend servers required. Bulk storage systems are a large-scale example; backed by high-capacity but low-throughput hard disks, they use flash caches to provide a cost-effective storage layer underlying everything from blobstores to data warehouses.However, flash caches must address flash's limited write endurance by limiting the number of flash writes to avoid premature wear-out. Thus, most flash caches rely on admission policies to filter cache insertions and maximize the workload-reduction value of each write.This dissertation evaluates and demonstrates potential uses of ML in place of traditional heuristic cache management policies for flash caches in bulk storage systems. The most successful elements of my research are embodied in a flash cache system called Baleen, which uses coordinated ML admission and prefetching to reduce peak backend load. After learning painful lessons with early ML policy attempts, I exploit a new cache residency model (episodes) to guide model training. I focus on optimizing an end-to-end metric (Disk-head Time) that measures backend load more accurately than IO miss rate or byte miss rate. Evaluation using 7-day Meta traces from 7 storage clusters shows that Baleen reduces Peak Disk-head Time (and hence backend hard disks required) by 12% over state-of-the-art policies for a fixed flash write rate constraint.I present a TCO (total cost of ownership) formula quantifying the costs of additional flash writes against reductions in Peak Disk-head Time in terms of flash drives and hard disks needed. Baleen-TCO chooses optimal flash write rates and reduces estimated TCO by 17%.Workloads change over time, requiring that caches adapt to maintain performance. I present a strategy for peak load reduction that adapts selectivity to load levels. I also evaluated workload drift and its impact on ML policy performance on 30-day Meta traces. Baleen is the result of substantial exploration and experimentation with ML for caching. I present lessons learned from additional strategies considered and explain why they saw limited success on our workloads. These include enhancements for ML-based eviction, more complex ML models, and optimizing the use of DRAM in hybrid caches. I also present lessons from ML production deployments.Code and traces are available via https://www.pdl.cmu.edu/CILES. These include our 7-day traces which were the most extensive public collection of traces from a production bulk storage system at the time of writing. Thesis Committee: Gregory R. Ganger (Chair) David G. Andersen Nathan Beckmann Daniel S. Berger (Microsoft Research / University of Washington) Srinivasan Seshan, Head, Computer Science Department Martial Hebert, Dean, School of Computer Science Keywords: Flash caching, machine learning for caching, machine learning for systems, bulk storage systems CMU-CS-24-152.pdf (5.73 MB) ( 127 pages) Copyright Notice