Computer Science Speaking Skills Talk

— 3:30pm

Location:
In Person - Gates Hillman 9115

Speaker:
DANIEL LIN-KIT WONG , Ph.D. Student, Computer Science Department, Carnegie Mellon University
https://wonglkd.fi-de.net/

Baleen: ML Admission & Prefetching for Flash Caches

Flash caches are used to reduce peak backend load for throughput-constrained data center services, reducing the total number of backend servers required. Bulk storage systems are a large-scale example, backed by high-capacity but low-throughput hard disks, and using flash caches to provide a more cost-effective storage layer underlying everything from blobstores to data warehouses.  

However, flash caches must address the limited write endurance of flash by limiting the long-term average flash write rate to avoid premature wearout. To do so, most flash caches must use admission policies to filter cache insertions and attempt to maximize the workload-reduction value of each flash write.  

In this talk, I will introduce the Baleen flash cache. Baleen uses coordinated ML admission and prefetching to reduce peak backend load. After learning painful lessons with our early ML policy attempts, we exploit a new cache residency model (which we call episodes) to guide model training. We focus on optimizing for an end-to-end system metric (Disk-head Time) that measures backend load more accurately than IO miss rate or byte miss rate. Evaluation using Meta traces from seven storage clusters shows that Baleen reduces peak load by 11.8% over state-of-the-art policies. Baleen-TCO, which chooses an optimal flash write rate, reduces our estimated total cost of ownership (TCO) by 15.8%. Presented in Partial Fulfillment of the CSD Speaking Skills Requirement


Add event to Google
Add event to iCal