Charles Garrod Putting the "Scalability" into Database Scalability Services Degree Type: Ph.D. in Computer Science Advisor(s): Bruce M. Maggs, Christopher Olston Graduated: August 2008 Abstract: Applications deployed on the Internet are immediately accessible to a vast population of potential users, and as a result they tend to experience unpredictable and widely fluctuating demand. System administrators currently face a provisioning dilemma to address this demand: whether to (1) waste money by heavily overprovisioning systems, or (2) risk loss of availability during times of high demand. This problem is largely solved for static Web content, but existing approaches do not apply well to dynamic content produced by data-intensive Web applications for which a central database server limits scalability. To address this problem, we design and build a Database Scalability Service (DBSS), which can offer scalability to data-intensive Web applications as a third-party service much like Content Delivery Networks currently scale static Web content. The key challenge in building a DBSS is to enable the DBSS to off-load database requests from a content provider's central database server while ensuring that the DBSS uses up-to-date, consistent data as the database is updated. In addition, a DBSS faces the additional problems of maintaining high quality of service for each content provider as well as guaranteeing the privacy of a content provider's data. In this thesis, we focus on the scalability-related aspects of a DBSS. We design and evaluate a DBSS, called Ferdinand, that uses a multi-tiered caching architecture, with a local database cache at each Ferdinand server and a shared, collaborative cache distributed among Ferdinand's nodes. Ferdinand maintains the consistency of the database caches using a fully distributed publish / subscribe system, notifying each cache of database updates without placing additional administrative load on the central database server. Our primary technique to efficiently maintain cache consistency is to specialize the publish / subscribe system for each Web application, using an offline analysis of the Web application and its database requests. We use compiler-like techniques to advance the state-of-the-art in this offline analysis, allowing us to better understand how a Web application is affected by updates to its database. Overall, we show that our multi-tiered cache design and scalable consistency management are both critical at maximizing Ferdinand's scalability, and that the Ferdinand DBSS can scale the throughput of data-intensive Web applications by more than a factor of 10. Thesis Committee: Bruce M. Maggs (Chair) David Andersen Anthony Tomasic Christopher Olston (Yahoo! Research) Mike Dahlin (University of Texas at Austin) Peter Lee, Head, Computer Science Department Randy Bryant, Dean, School of Computer Science Keywords: Web applications, scalability, publish/subscribe, view invalidation, view materialization, database query result caching CMU-CS-08-150.pdf (740.58 KB) ( 147 pages) Copyright Notice