Daniel Anderson

Parallel Batch-Dynamic Algorithms Dynamic Trees, Graphs, and Self-Adjusting Computation Degree Type: Ph.D. in Computer Science
Advisor(s): Guy E. Blelloch
Graduated: August 2023

Abstract:

The defining feature of many modern large-scale computer systems is the sheer amount of data that they generate and process. Google's MapReduce clusters process over twenty petabytes of data per day, and Facebook has reported processing over 4 petabytes of data per day and running over one million map reduce computations over this data. Modern systems for processing this kind of data make extensive use of parallel and distributed computation to achieve the necessary level of throughput. An important property of the datasets involved in such computations, however, is that they are not static. Rather, they are frequently evolving. Large social networks, for example, undergo rapid changes; users are added, and links between uses are created and deleted rapidly, but each such update affects a relatively tiny portion of the data.

This thesis explores a relatively under-explored area of algorithm design intended to tackle these kinds of problems: parallel batch-dynamic algorithms. Classic algorithms for handling dynamic data support a single update at a time, but this model is insufficient for handling the scale of modern rapidly changing datasets, and furthermore yields little room to exploit parallelism within the updates. A batch-dynamic algorithm consumes multiple updates at a time, which allows for increased throughput and more opportunities for parallelism.

In the first part of the thesis, we will design algorithms for parallel batch- dynamic trees based on sequential Rake-Compress Trees (RC Trees). In the second part of this thesis, we will design algorithms for batch-dynamic graph connectivity and batch-incremental minimum spanning trees. Then, in Part Three, we show that batch-dynamic algorithms can be used as ingredients in designing highly-efficient parallel static algorithms. Using our parallel Rake-Compress Tree data structure as an ingredient, we obtain the first ever work-efficient parallel algorithm for minimum cuts.

The final part of this thesis will explore the implementation of systems for processing batch-dynamic data. One factor that hampers the adoption of efficient dynamic algorithms in practice is that they are often extremely complicated and difficult to implement. To address this issue in the sequential setting, self-adjusting computation is a technique for automatically dynamizing static algorithms to create dynamic ones. This lowers the implementation burden on programmers. In this thesis, we show that self-adjusting computation can be generalized to automatically dynamize parallel algorithms, allowing programmers to reap the benefits of parallel batch-dynamic algorithms without the burden of implementing them from scratch.

Thesis Committee:
Guy Blelloch (Chair)
Phillip Gibbons
Daniel Sleator
Julian Shun (Massachusetts Institute of Technology)
Valerie King (University of Victoria, Canada)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science

Keywords:
Dynamic algorithms, Parallel algorithms, Graph algorithms

CMU-CS-23-120.pdf (2.34 MB) ( 225 pages)
Copyright Notice