Computer Science Thesis Proposal

Friday, May 6, 2022 - 1:00pm to 2:30pm


In Person McWilliams Classroom, Gates Hillman 4303


MICHAEL KUCHNIK, Ph.D. StudentComputer Science DepartmentCarnegie Mellon University

Beyond Model Efficiency: Data Optimizations for Machine Learning Systems

The field of machine learning has exploded due to the increased availability of data, compute, and algorithms. Systems built to support machine learning models have primarily focused on the compute path of the model itself. This thesis proposes to investigate the role of the data-path in both training and validation. For the first part of the thesis, we focus on training data, illustrating that the training data pipeline is a prime target for performance considerations. To aid in addressing performance issues, we introduce a form of training-pipeline subsampling, a reduced fidelity disk format, and a system for automatically tuning data pipeline performance knobs. In the second part, we propose to turn to the validation set of training, developing a system for automatically querying and validating a language model’s behavior. We conclude with thoughts on how machine learning systems can expose data-friendly interfaces in upcoming generations of systems. Thesis Committee: George Amvrosiadis (Co-Chair) Virginia Smith (Co-Chair) Greg Ganger Tianqi Chen Paul Barham (Google)

For More Information, Contact:


Thesis Proposal