Kristen Gardner

Modeling and Analyzing Systems with Redundancy Degree Type: Ph.D. in Computer Science
Advisor(s): Mor Harchol-Balter
Graduated: May 2017

Abstract:

Reducing latency is a primary concern in computer systems. As cloud computing and resource sharing become more prevalent, the problem of how to reduce latency becomes more challenging because there is a high degree of variability in server speeds. Recent computer systems research has shown that the same job can take 12 times or even 27 times longer to run on one machine than another, due to varying background load, garbage collection, network contention, and other factors. This server variability is transient and unpredictable, making it hard to know how long a job will take to run on any given server, and therefore how best to dispatch and schedule jobs.

An increasingly popular strategy for combating server variability is redundancy. The idea is to create multiple copies of the same job, dispatch these copies to different servers, and wait for the first copy to complete service. A great deal of empirical computer systems research has demonstrated the benefits of redundancy: using redundancy can yield up to a 50% reduction in mean response time. Unfortunately, there is very little theoretical work analyzing performance in systems with redundancy.

This thesis presents the first exact analysis of response time in systems with redundancy. We begin in the Independent Runtimes (IR) model, in which a job's service times (runtimes) are assumed to be independent across servers. Here we derive exact expressions for the distribution of response time in a certain set of class based redundancy systems. We also propose two new scheduling policies, Least Redundant First (LRF) and Primaries First (PF), and prove that LRF minimizes overall system response time, while PF is fair across classes of jobs with different redundancy degrees.

While the IR model is appropriate in certain settings, in others it does not make sense because the independence assumption eliminates any notion of an "inherent job size." The IR model leads to the conclusion that more redundancy is always better, which often is not true in practice. Therefore we propose the S&X model, which is the first model to decouple a job's inherent size (X) from the server slowdown (S). This model is important because, unlike prior models, it allows a job's runtimes to be correlated across servers. The S&X model makes it evident that redundancy does not always help: in fact, too much redundancy can lead to instability. To overcome this, we propose a new dispatching policy, Redundant-to-Idle-Queue, which is provably stable in the S&X model, while offering substantial response time improvements compared to systems without redundancy

Thesis Committee:
Mor Harchol-Balter (Chair)
Guy Blelloch
Anupam Gupta
Alan Scheller-Wolf (Tepper)
Rhonda Righter (University of California, Berkeley)

Frank Pfenning, Head, Computer Science Department
Andrew W. Moore, Dean, School of Computer Science

Keywords:
Redundancy, replication, task assignment, dispatching, scheduling, queueing theory, stochastic processes, Markov models, RIQ, Redundancy-d, LRF, PF

CMU-CS-17-112.pdf (2.33 MB) ( 170 pages)
Copyright Notice