Computer Science Thesis Oral
In Person - Reddy Conference Room, Gates Hillman 4405
DANIEL LIAM ANDERSON
Ph.D. Candidate, Computer Science Department, Carnegie Mellon University
Parallel Batch-Dynamic Algorithms: Dynamic Trees, Graphs, and Self-Adjusting Computation
The defining feature of many modern large-scale computer systems is the sheer amount of data that they generate and process. In 2008, it was reported that Google's MapReduce clusters process over twenty petabytes of data per day. Facebook has reported processing over 4 petabytes of data per day and running over one million map-reduce computations over this data. Modern systems for processing this kind of data make extensive use of parallel and distributed computation to achieve the necessary level of throughput. An important property of the datasets involved in such computations, however, is that they are not static. Rather, they are frequently evolving. Large social networks, for example, undergo rapid changes; users are added, and links between uses are created and deleted rapidly, but each such update affects a relatively tiny portion of the data.
This thesis explores a relatively under-explored area of algorithm design intended to tackle these kinds of problems: parallel batch-dynamic algorithms. Classic algorithms for handling dynamic data support a single update at a time, but this model is insufficient for handling the scale of modern rapidly changing datasets, and furthermore yields little room to exploit parallelism within the updates. A batch-dynamic algorithm consumes multiple updates at a time, which allows for increased throughput and more opportunities for parallelism.
In the first part of the thesis, we will design algorithms for parallel batch-dynamic trees based on Rake-Compress Trees (RC-Trees). In the second part of this thesis, we will design algorithms for batch-dynamic graph connectivity and batch-incremental minimum spanning trees. Then, in Part Three, we show that batch-dynamic algorithms can be used as ingredients in designing highly-efficient parallel static algorithms. Using our parallel RC-Tree data structure as an ingredient, we obtain the first ever work-efficient parallel algorithm for minimum cuts. The final part of this thesis will explore the implementation of systems for processing batch-dynamic data. We show that self-adjusting computation can be generalized to automatically dynamize parallel algorithms, allowing programmers to reap the benefits of parallel batch-dynamic algorithms without the burden of implementing them from scratch.
Guy Blelloch (Chair)
Julian Shun (Massachusetts Institute of Technology)
Valerie King (University of Victoria, Canada)
In Person and Zoom Participation. See announcement.