Mengzhi Wang

Performance Modeling of Storage Devices using Machine Learning Degree Type: Ph.D. in Computer Science
Advisor(s): Anastassia Ailamaki
Graduated: December 2005

Abstract:

Performance models of storage devices make it possible to evaluate storage resource configurations efficiently, allowing systems to search automatically a large number of candidates before locating an optimal or near-optimal one. This thesis explores the feasibility of using machine learning techniques to build such performance models. The models are constructed through "training", during which the model construction algorithm observes storage devices under a set of training traces and builds the models based on the observations. The main advantage of the approach is the automation of the model construction algorithm, in addition to the high efficiency in both computation and storage.

In our design, the models represent an I/O workload as vectors, and model its performance on storage devices as functions over the vectors using a regression tool. We have identified that vector representation of workloads, the regression tool, and training traces are three important factors in model quality. This thesis provides a thorough evaluation of existing techniques in addressing these issues. In addition, we have proposed the entropy plot to characterize the spatio-temporal behavior of I/O workloads and the PQRS model to generate traces of given characteristics to augment existing work in workload characterization.

Our experiments on real-world traces have shown that the learning-based models are fast and accurate when the training and testing traces are similar. Offline training using synthetic traces, however, is less effective because the synthetic trace generators fail to capture the strong correlations between requests. Our error analyses have shown both the vector representation and synthetic trace generators have space for further improvement.

Thesis Committee:
Anastassia Ailamaki (Chair)
Anthony Brockwell
Christos Faloutsos
Gregory R. Ganger
John Wilkes (Hewlett Packard Laboratories)

Jeannette Wing, Head, Computer Science Department
Randy Bryant, Dean, School of Computer Science

Keywords:
Machine learning, learning-based performance models, storage devices, automation of model construction

CMU-CS-05-185.pdf (4.08 MB) ( 180 pages)
Copyright Notice