Computer Science Thesis Proposal

— 5:30pm

In Person and Virtual - ET - Gates Hillman 8102

GIULIO ZHOU , Ph.D. Student, Computer Science Department, Carnegie Mellon University

Mitigating fragility in machine learning systems using structured models

Machine learning (ML) is increasingly used to drive applications in a variety of complex settings, such as web-scale search, content recommendation, autonomous vehicles, and language-based digital assistants. These ML systems have, in recent years, become predominantly data-driven, often underpinned by deep learning models that can effectively learn complex functions end-to-end from large amounts of available data. Since they make few assumptions on the learned function, neural networks are a flexible and effective tool for a wide range of tasks and environments. However, their purely data-driven nature also makes the learned solutions opaque, sample inefficient, and potentially brittle.

To address these problems, there has been much work on imposing structure into ML models. Some approaches impose structure implicitly (e.g. through architecture design and data augmentation), while others impose structure explicitly, such as by incorporating latent priors, geometric constraints, and physical models. When suitably applied, imposing explicit structure takes advantage of the powerful learning capabilities of deep learning models while avoiding the shortcomings of their end-to-end and blackbox nature. Imposing such structure into models also improves their transparency and yields a framework to represent various characteristics integral to the modeled task or environment, such as variability, periodicity, stationarity, smoothness, and monotonicity.

In this thesis, we explore approaches to improving the reliability and transparency of ML systems by imposing explicit structure in the form of simple parametric models. Compared to previous approaches that incorporate known domain-specific structure (e.g. based on physical models), we show that simple parametric models are widely applicable, i.e. to any system whose behavior can be well-approximated by these models. We explore this using three case studies in different ML systems. In our first work, we show how to build an effective ML-driven storage system by modeling the variability of large warehouse-scale storage systems using univariate log-normal distributions (parametrized by neural networks.)

In our second work, we conduct user studies to show that the typical assumption of stationary rewards in bandit-based recommender systems does not hold in practice, demonstrating the importance of validating the structures underlying ML systems. Lastly, for our proposed work, we analyze the benefits of imposing latent space structure in variational auto-encoders (VAEs) models of text. To improve the reliability of semantic transformations (such as changing tense and sentiment), we propose quantifying the isotropy and smoothness of the VAE latent space and exploring transformation methods that take advantage of its unique geometry.

Thesis Committee:
David G. Andersen (Chair)
Zachary Lipton J.
Zico Kolter
Byron Wallace (Northeastern University)

In and Person and Zoom Participation.  See announcement.

Add event to Google
Add event to iCal