Computer Science 5th Years Master's Thesis Presentation - POSTPONED April 24, 2024 10:00am Location: In Person - Traffic21 Classroom, Gates Hillman 6501 Speaker: BRIAN ZHANG - To Be Rescheduled , Masters Student, Computer Science Department, Carnegie Mellon University Towards an OS for GPUs: Threadblock Scheduling for Deep Learning Workloads As the year over year performance gains of CPUs has stagnated with the death of Moore's Law, GPUs and other data parallel chips have seen a surge in demand particularly for use in datacenter deep learning workloads. In spite of the growing demand, many companies are unable to fully utilize the hardware that is already in their datacenters. In fact, Alibaba reported a median GPU utilization of less than 10% in 2020. This number implies vast over-provisioning and shows the benefits to be gained via GPU multi-tenancy. Just as multi-tenancy with traditional CPU architectures is facilitated with an OS, we believe that an OS can similarly solve this problem for GPUs. In this thesis we describe the design and implementation of the compute scheduler of AxOS, an OS for data parallel accelerators. AxOS allows for transparency, high GPU utilization, performance isolation, and spatial stacking between multiple processes using the GPU. To achieve this, AxOS has a novel threadblock-centric approach to GPU compute scheduling via the virtual streams abstraction, kernel chunking, and rightsizing. We evaluate AxOS on a number of deep learning workloads to show these benefits. Thesis Committee:Dimitrios Skarlatos (Chair)Todd MowryAdditional Information Add event to Google Add event to iCal