Nikolaos Hardavellas Chip Multiprocessors for Server Workloads Degree Type: Ph.D. in Computer Science Advisor(s): Babak Falsafi, Anastasia Ailamaki Graduated: August 2009 Abstract: We stand on the cusp of the giga-scale era of chip integration. Technological advancements in semiconductor fabrication yield ever-smaller and faster devices, enabling billion-transistor chips with multi-gigahertz clock frequencies. To utilize the abundant transistors on chip, modern processors pack an exponentially increasing number of cores on chip, multi-megabyte caches, and large interconnects to facilitate intra-chip data transfers. However, the growing on-chip resources do not directly translate into a commensurate increase in performance. Rather, they come at the cost of increased on-chip data access latency, while thermal considerations and pin constraints limit the parallelism that a multicore chip can support. To mitigate the increasing on-chip data access latency, cache blocks on chip should be placed close to the cores that use them. We observe that cache access patterns can be classified at run time into distinct classes with different on-chip block placement requirements. Based on this observation, we propose Reactive NUCA (R-NUCA), a distributed cache design which reacts to the class of each access to place blocks close to the requesting cores. We then explore the design space of physically-constrained multicore processors, and find that future multicores should utilize low-operational-power transistors even for time-critical components (e.g., cores) to ease the power wall, employ novel on-chip block placement techniques to utilize efficiently large caches, while techniques like 3D-stacked memory can mitigate the off-chip bandwidth constraint even for peak- performance designs. Moving forward, we find that heterogeneous multicores hold great promise in improving designs even further. Thesis Committee: Babak Falsafi (Co-Chair) Anastasia Ailamaki (Co-Chair) David R. O’Hallaron Todd C. Mowry Luiz André Barroso (Google) Keywords: computer architecture, cache, multicore, chip multiprocessors, data placement, chip design, NUCA, commercial server workloads, performance modeling, design-space exploration CMU-CS-09-150.pdf (1.76 MB) ( 154 pages) Copyright Notice