Doctoral Speaking Skills Talk - Ziyue Qiu

— 10:30am

Location:
In Person - Traffic21 Classroom, Gates Hillman 6121

Speaker:
ZIYUE QIU, Ph.D. Student, Computer Science Department, Carnegie Mellon University
https://ziyueqiu.github.io/

Hybrid cloud deployment of large-scale data analytics requires careful partitioning of the data and jobs between on-premise and cloud sites to avoid massive networking costs. Moirai is a new framework that analyzes job logs, including which data each job accessed, to determine which data should go on each side and which should be replicated. It also provides the job scheduler with table location and access-size prediction information, so it can determine the best location to execute each new job to minimize inter-site data fetching.  

Moirai’s optimization scales to huge data corpuses and minimizes dollar costs, by exploiting recurring job templates to identify data inter-dependencies, per-job read volumes, and ignore dependencies for lightly-used data to reduce optimizer complexity. Simulations driven by a 9-month trace of CorpX’s Presto cluster (84M queries, 24EB data-read volume) show that Moirai can reduce dollar costs for an on-premise/in-cloud hybrid deployment by ?95% relative to the state-of-the-art partitioning approach and over 99.5% relative to other public approaches. The savings come from 97–99.8% reduction in cloud egress, up to 99% reduction in replication, and 85–97% reduction in on-premises uplink requirements. 

Presented in Partial Fulfillment of the CSD Speaking Skills Requirement


Add event to Google
Add event to iCal