Doctoral Thesis Proposal - Timothy Kim

May 15, 2026  1:00PM—2:30PM

Location:
4405 - Gates and Hillman Centers

Speaker:
TIMOTHY KIM, Ph.D. Student, Computer Science Department, Carnegie Mellon University
https://github.com/TKim18

Efficient Data Storage Provisioning, Placement, and Transitions at Scale

Exascale storage systems are under increasing pressure to store more data while providing greater performance per byte. Historically, hyperscale storage systems have relied on a two-tiered hierarchy: hard-disk drives store most bytes, while smaller flash tiers absorb requests for hot and performance-critical data. This design is becoming increasingly difficult to sustain. Datacenter data is getting warmer with the proliferation of AI/ML and analytics-heavy workloads, while storage devices are becoming denser without proportional improvements in per-byte performance or endurance. As a result, enabling denser storage devices at exascale requires improving both the software storage system and the hardware provisioning strategies for modern datacenter workloads.

This thesis shows that exascale storage systems can enable denser storage options by jointly reducing internal IO and improving data placement/hardware provisioning decisions. The first part of this thesis, Morph, reduces IO associated with lifetime redundancy transitions. Morph introduces a novel hybrid redundancy scheme for early-life data and a system designed around a new erasure-code for late-life, reducing ingest and transcode overheads. The second part of this thesis develops a total-cost-of-ownership (TCO) model and optimizer for exascale storage provisioning. This model determines the minimum-cost set of hardware necessary to serve datacenter workloads and shows how heterogeneous configurations can cost-effectively enable dense devices in modern datacenters.

We propose work that connects these two directions by modeling storage workloads from the bottom up, using fine-grained lifetime and temperature transition behavior to reason about provisioning and placement decisions. By understanding how data cools and survives throughout its lifetime, this work aims to produce a more precise framework that co-optimizes the heterogeneous storage mixture and the placement of data across the storage tiers. Together, these contributions show how storage systems can cost-effectively service contemporary storage workloads with denser media and enable the massive growth in data demand.

Thesis Committee

Greg Ganger (Co-Chair)
Rashmi Vinayak (Co-Chair)
George Amvrosiadis
Saurabh Kadekodi (Google)

Additional Information 

Contact
Matt Stewart


Add event to Google
Add event to iCal