Computer Science Thesis Oral

— 5:00pm

In Person and Remote - ET - Traffic21 Classroom, Gates Hillman 6501 and Zoom

Ph.D. Candidate, Computer Science Department, Carnegie Mellon University

Designing Storage Codes for Heterogeneity: Theory and Practice

Distributed storage systems support many essential applications, and thus need to be highly reliable. To achieve this goal at a low cost, most systems use erasure codes. The parameters of the erasure code (which affect the cost and level of protection) are set based on the expected operating conditions. However, conditions vary significantly across time and across the system. For example, failure rates, workloads, and density of devices can change with time and in different locations. Many existing systems fail to accommodate these variations, or do so in inefficient ways. My thesis focuses on making distributed storage systems more robust and efficient by enabling them to automatically adapt to these variations. To make progress towards this goal, I develop and use tools from both Coding Theory and Computer Systems research. The first part focuses on variations in the system across time. Our main contribution here is the "convertible codes" framework, designed to study and construct erasure codes that can efficiently change their parameters over time. We propose the framework, derive the fundamental limits of this problem and design optimal codes. Additionally, we propose two distributed storage system designs, which automatically decide when and how to convert between codes. The second part focuses on heterogeneity across the system. Specifically, we consider a geo-distributed storage system, where the density of nodes and latencies between nodes vary significantly, and the cost of sending data across the wide-area network (WAN) is crucial. Our main contribution is a new class of codes that optimizes both the storage overhead and WAN bandwidth given the parameters of the system. We additionally propose a new strongly-consistent geo-distributed storage system that jointly optimizes its consensus protocol and erasure code.

Thesis Committee:

Rashmi Vinayak (Chair)
Gregory R. Ganger
Ryan O'Donnell
Muriel Médard (Massachusetts Institute of Technology)

In Person and Zoom Participation. See announcement.

Add event to Google
Add event to iCal