Doctoral Thesis Oral Defense - Sara McAllister

— 5:00pm

Location:
In Person and Virtual - ET - Reddy Conference Room, Gates Hillman 4405 and Zoom

Speaker:
SARA McALLISTER , Ph.D. Candidate, Computer Science Department, Carnegie Mellon University
https://saramcallister.github.io/

Toward Sustainable Datacenters through Efficient Data Retrieval

Datacenters are projected to account for 33% of the global carbon emissions by 2050. As datacenters increasingly rely on renewable energy for power, the majority of datacenter emissions will be embodied — emissions from lifecycle stages including acquiring raw materials, manufacturing, transportation, and disposal. To reach the ambitious emission reduction goals set by both companies and governments, datacenters need to reduce emissions throughout their operations, including (and particularly relevant for this thesis) the storage system. Unfortunately, while data storage and retrieval systems are large contributors to embodied emissions, reducing their embodied emissions have largely been overlooked.

This dissertation addresses how to reduce emissions in data retrieval for large-scale storage systems. These storage systems can reduce their carbon footprint by enabling storage devices to have longer lifetimes and use denser media. However, storage hardware's IO limits combined with software's unnecessary additional IO often severely restrict emission reductions, or at worse cause increased emissions. Thus, this thesis focuses on reducing IO in several parts of the storage stack to enable efficient and sustainable data retrieval.

First, this dissertation addresses the sustainability of flash caching, a critical layer in datacenter storage systems that is limited by flash write endurance. This improvement results from two caching systems: Kangaroo and FairyWREN. Together, these caches dramatically reduce writes by over 28x, allowing flash devices to use denser flash for longer lifetimes, ultimately reducing emissions. Then, this thesis discusses enable more sustainable bulk storage, where bandwidth limitations prevent deployment of denser HDDs. Declarative IO, a new interface for distributed storage, empowers the storage system to eliminate duplicate IO accesses in maintenance tasks through exposing the time- and order-flexibility in maintenance tasks. This work enables deployment of larger HDDs, further reducing emissions from storage systems.

Thesis Committee

Gregory R. Ganger (Co-Chair)
Nathan Beckmann (Co-Chair)
George Amvrosiadis
Daniel Berger (Microsoft Azure/University of Washington)
Margo Seltzer (University of British Columbia)

In Person and Zoom Participation.  See announcement.


Add event to Google
Add event to iCal