Doctoral Thesis Oral Defense - Honghao Lin

— 2:00pm

Location:
In Person and Virtual - ET - Reddy Conference Room, Gates Hillman 4405 and Zoom

Speaker:
HONGHAO LIN , Ph.D. Candidate
Computer Science Department
Carnegie Mellon University

https://honghlin.github.io/

Algorithms for Massive Data: Optimal Bounds, Adversarial Robustness, and Data-Driven Insights

With the rapid growth of massive datasets in areas such as machine learning and numerical linear algebra, classical algorithms are often no longer feasible. In this thesis, we develop provably efficient algorithms for various problems in these settings, such as the streaming and distributed model. Our contributions span three directions:

  1. Optimal Bounds.  We introduce a general technique for lifting dimension lower bounds for real-valued linear sketches to polynomially bounded integer inputs. This leads to the first optimal sketching lower bounds for discrete data streams in fundamental problems such as frequency moment approximation, operator norm estimation, and compressed sensing. Beyond this, we also establish nearly-optimal bounds for a variety of streaming and sketching tasks, including ℓ p subspace sketches for constant dimension d, ℓ p regression in the arbitrary-partition distributed model, and graph problems such as approximating the minimum cut and constructing cut sparsifiers in balanced directed graphs.
  2. Adversarial Robustness.  While most streaming algorithms are studied in static worst-case models, many practical scenarios involve adaptive adversaries who generate inputs based on previous outputs. We present the first adaptive attack against linear sketches for ℓ p-estimation over turnstile integer streams. Specifically, we show that any linear streaming algorithm with sketching matrix A ∈ ℤrxn can be broken using only poly(r log n) queries, with high constant probability. This result highlights fundamental limits of robustness in adaptive streaming. Furthermore, we will next introduce our recent work on an adversarially robust F2 estimation algorithm, based on a non-linear sketch, that achieves polylogarithmic space in turnstile streams.
  3. Learning-based Algorithms.  Classical algorithms guarantee correctness in the worst case but often ignore structure in real-world data, while machine learning methods leverage structure but typically lack guarantees. We design learning-based algorithms that incorporate machine learning predictions to adapt to input distributions, achieving faster runtimes, reduced space, or improved accuracy. Crucially, these algorithms retain rigorous worst-case guarantees even when the predictions are imperfect, bridging the gap between theory and data-driven practice.  

Thesis Committee
David P. Woodruff (Chair)
Yang P. Liu
Richard Peng
Jelani Nelson (University of California, Berkeley)

In Person and Zoom Participation.  See announcement. 

For More Information:
matthewstewart@cmu.edu


Add event to Google
Add event to iCal