Tiankai Tu

A Scalable Database Approach to Computing Delaunay Triangulations Degree Type: Ph.D. in Computer Science
Advisor(s): David R. O'Hallaron
Graduated: August 2008

Abstract:

As we ride on a wave of technological innovation towards the age of petascale computing, massive input data models and voluminous output data sets involved in large-scale parallel computer simulations and scientific instruments will soon (if not already) overwhelm our ability to generate, manipulate, and interpret them. Although efficient and effective for processing relatively small data sets, traditional main-memory--based algorithms lack the necessary scalability to deal with massive data sets that require more memory than that is available. To address the new data challenge, we must seek a more scalable solution.

Inspired by the success of database management systems in managing huge information stores for the commercial and governmental sectors, we propose a database approach to computing and managing large-scale scientific data sets. Although unstructured scientific data sets do not map to the tabular representation of a database system naturally, we resolve the mismatch by introducing a computational cache on top of a standard database buffer pool and providing a mechanism to translate data between the inherent unstructured representation (stored in the computational cache) and the native database format (stored in database pages). Scientific application codes then operate directly on the computational cache instead of on the native database format.

We have implemented our methodology in a prototype system called Abacus that targets Delaunay triangulations, a commonly used unstructured data representation used by many scientific and engineering applications. Abacus can not only store and index existing Delaunay triangulations but also generate large-scale Delaunay triangulations from scratch. Evaluation results show that:

  • Computing a Delaunay triangulation using Abacus is more than three orders of magnitude faster than an implementation that uses standard database techniques
  • The performance of Abacus matches that of the state-of-the-art 2D and 3D Delaunay triangulator (Triangle and Pyramid) when triangulating data sets that fit in main memory
  • Abacus achieves scalable performance even when triangulating data sets four orders of magnitude larger than the main memory (while other software has stopped working long ago)

Furthermore, we demonstrate the scalability of Abacus in the context of a grand challenge application---earthquake ground motion modeling, where we use Abacus to compute a series of large-scale 3D Delaunay triangulated finite element meshes with multi-billion tetrahedral elements.

Although some of the techniques presented in this dissertation are specific to Delaunay triangulations, the proposition of a database approach is more general; it points to a promising new way of how to leverage and extend existing database techniques to compute and manage large-scale unstructured scientific data sets of the coming era.

Thesis Committee:
David R. O’Hallaron (Chair)
Anastassia Ailamaki
Jacobo Bielak
Todd C. Mowry
Jonathan R. Shewchuk (University of California at Berkeley)

Peter Lee, Head, Computer Science Department
Randy Bryant, Dean, School of Computer Science

Keywords:
unstructured scientific data sets, tabular data representation, structural mismatch, computational cache, translation mechanism, Delaunay triangulation

CMU-CS-08-138.pdf (12.68 MB) ( 134 pages)
Copyright Notice