Gennady G. Pekhimenko Practical Data Compression for Modern Memory Hierarchies Degree Type: Ph.D. in Computer Science Advisor(s): Todd Mowry, Onur Mutlu Graduated: August 2016 Abstract: Although compression has been widely used for decades to reduce file sizes (thereby conserving storage capacity and network bandwidth when transferring files), there has been limited use of hardware-based compression within modern memory hierarchies of commodity systems. Why not? Especially as programs become increasingly data-intensive, the capacity and bandwidth within the memory hierarchy (including caches, main memory, and their associated interconnects) have already become increasingly important bottlenecks. If hardware-based data compression could be applied successfully to the memory hierarchy, it could potentially relieve pressure on these bottlenecks by increasing effective capacity, increasing effective bandwidth, and even reducing energy consumption. In this thesis, we describe a new, practical approach to integrating hardware-based data compression within the memory hierarchy, including on-chip caches, main memory, and both on-chip and off-chip interconnects. This new approach is fast, simple, and effective in saving storage space. A key insight in our approach is that access time (including decompression latency) is critical in modern memory hierarchies. By combining inexpensive hardware support with modest OS support, our holistic approach to compression achieves substantial improvements in performance and energy efficiency across the memory hierarchy. Using this new approach, we make several major contributions in this thesis. First, we propose a new compression algorithm, Base-Delta-Immediate Compression (BΔI), that achieves high compression ratio with very low compression/decompression latency. BΔI exploits the existing low dynamic range of values present in many cache lines to compress them to smaller sizes using Base+Delta encoding. Second, we observe that the compressed size of a cache block can be indicative of its reuse. We use this observation to develop a new cache insertion policy for compressed caches, the Size-based Insertion Policy (SIP), which uses the size of a compressed block as one of the metrics to predict its potential future reuse. Third, we propose a new main memory compression framework, Linearly Compressed Pages (LCP), that significantly reduces the complexity and power cost of supporting main memory compression. We demonstrate that any compression algorithm can be adapted to fit the requirements of LCP, and that LCP can be efficiently integrated with the existing cache compression designs, avoiding extra compression/decompression. Finally, in addition to exploring compression-related issues and enabling practical solutions in modern CPU systems, we discover new problems in realizing hardware-based compression for GPU-based systems and develop new solutions to solve these problems. Thesis Committee: Todd C. Mowry (Co-Chair) Onur Mutlu (Co-Chair) Kayvon Fatahalian Davie A. Wood (University of Wisconsin-Madison) Douglas C. Burger (Microsoft) Michael A. Kozuch (Intel) Frank Pfenning, Head, Computer Science Department Andrew W. Moore, Dean, School of Computer Science Keywords: Data Compression, Memory Hierarchy, Cache Compression, Memory Compression, Bandwidth Compression, DRAM, Main Memory, Memory Subsystem CMU-CS-16-116.pdf (3.9 MB) ( 197 pages) Copyright Notice