Computer Science Thesis Oral

Wednesday, November 30, 2022 - 12:30pm to 2:30pm

Location:

In Person and Virtual - ET Reddy Conference Room, Gates Hillman 4405 and Zoom

Speaker:

ELLANGO JOTHIMURUGESAN, Ph.D. CandidateComputer Science DepartmentCarnegie Mellon University

Large-scale Machine Learning over Streaming Data

This thesis introduces new techniques for efficiently training machine learning models over continuously arriving data to achieve high accuracy, even under changes in the data distribution over time, known as concept drift. First, we address the case of IID data with STRSAGA, an optimization algorithm based on variance-reduced stochastic gradient descent that can incorporate incrementally arriving data and efficiently converges to statistical accuracy. Second, we address the case of non-IID data over time with DriftSurf. Previous work on drift detection generally rely on magic thresholds, making them less practical without prior knowledge of the magnitude and rate of change. DriftSurf improves the robustness of traditional change detection tests through a stable-state/reactive-state process, and attains higher statistical accuracy whenever an efficient optimizer like STRSAGA is used. Third, we address the case of non-IID data both over time and distributed in space in the federated learning setting with FedDrift. Previous centralized drift adaptation and previous personalized federated learning methods are ill-suited for staggered drifts. FedDrift is the first algorithm explicitly designed for both dimensions of heterogeneity, and identifies distinct concepts by learning a time-varying clustering, which enables accurate collaborative training despite drifts. We show the presented algorithms are effective through theoretical competitive analyses and experimental studies that demonstrate higher accuracy on benchmark datasets over the prior state-of-the-art. Thesis Committee: Phillip B. Gibbons (Chair) Gauri Joshi Virginia Smith Kevin Hsieh (Microsoft) In Person and Zoom Participation.  See announcement.

For More Information, Contact:

Keywords:

Thesis Oral