Praneeth Kacham

On Efficient Sketching Algorithms Degree Type: Ph.D. in Computer Science
Advisor(s): David P. Woodruff
Graduated: May 2024

Abstract:

Sketching refers to a wide variety of techniques to compress large datasets into much smaller forms that can be efficiently processed to answer questions about the original dataset. Over the past few decades, sketching has emerged as a key tool to efficiently handle large datasets in majorly three settings: (i) the Classic setting, in which the dataset is given to us and we want to solve a problem as quickly as possible, (ii) the Streaming setting, in which the underlying dataset is defined by a large stream of updates and we want to compute interesting properties of the dataset using a small amount of space, and (iii) the Distributed setting, in which the dataset of interest is split among multiple servers and we want protocols that use a small amount of communication among servers to solve problems of interest on the underlying dataset.

Each of the above settings presents a different challenge with regard to the measure of efficiency we are interested in. In this thesis, we study sketching algorithms in these three settings for a variety of problems. While the techniques required to obtain our algorithms differ across problems and settings, the underlying idea of data compression to convert the original large dataset into a much smaller form is a key ingredient behind all the results in this thesis.

Thesis Committee:
David Woodruff (Chair)
Pravesh K. Kothari
Richard Peng
Rasmus Pagh (University of Copenhagen)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science

Keywords:
Sketching, Streaming, Distributed Algorithms, Regression, Low Rank Approximation

CMU-CS-24-131.pdf (2.21 MB) ( 446 pages)
Copyright Notice