Todd Mowry

Professor

Website

ORCID 0000-0003-4076-5684

Office

9113 Gates and Hillman Centers

Email

tcm@cs.cmu.edu

Phone

(412) 268-3725

Department

Computer Science Department

Administrative Support

Christina Contreras

Research Areas

Systems

Research Interests

Computer Architecture

Databases

Advisees

Sam Arch

Hongyi Jin

Ruihang Lai

CSD Courses Taught

15418 - Spring, 2026

15618 - Spring, 2026

Research Statement

The goal of my research is to dramatically boost the performance of future microprocessor-based systems. To accomplish this, we exploit various forms of parallelism through a combination of novel architectural, compiler and operating systems support. In particular, we have been focusing on the opportunities and challenges created by two important VLSI technology trends which are expected to reshape computer systems over the next decade: the potential for single-chip multiprocessing due to higher levels of single-chip integration, and the need to tolerate off-chip latency as the gap between processor speed and the speed of memory and I/O continues to widen.

Single-Chip Multiprocessing: The STAMPede Project. As advances in integrated circuit technology continue to provide more and more transistors on a chip, processor architects are faced with the pleasant challenge of finding the best way to translate these additional resources into improved performance. One of the more compelling options is to integrate multiple processors onto the same chip. While this will certainly increase computational throughput, it will only reduce execution time of a given application if it can be run in parallel. Hence the key question is how do we convert the applications that we care about into parallel programs? Expecting programmers to only write parallel programs from now on is unrealistic. Instead, the preferred solution would be for the compiler to parallelize programs automatically. Unfortunately, compilers have only been successful so far at parallelizing the numeric applications commonly run on supercomputers. For single-chip multiprocessing to have an impact on the majority of users, we must also find a way to automatically parallelize the non-numeric applications (e.g., spreadsheets, web software, graphics codes, etc.) which account for the bulk of the software run on commercial microprocessors. Based on our preliminary studies, we believe that a breakthrough in our ability to automatically parallelize non-numeric applications may be possible through "thread-level data speculation", which is a technique that allows the compiler to safely parallelize applications in cases where it believes that dependences are unlikely, but cannot statically prove that they do not exist. To accomplish this, we add modest hardware support to track data dependence violations at run-time and alert the software so that it can recover appropriately. Developing the architectural, compiler, and operating system support necessary to turn this potential into a reality is the goal of the STAMPede (Single-chip Tightly-coupled Architecture for MultiProcessing) project.

Coping with Large Latencies. Processor speeds are continuing to increase far more rapidly than off-chip components such as DRAM, disk, and networks, largely due to physical limitations such as distance and the speed of light. The challenge presented by this trend is that from the processor's perspective, the latency of main memory and I/O is increasing at a dramatic rate, and thus threatens to become an increasingly important performance bottleneck. The good news, however, is that the bandwidth of these off-chip devices has been improving through innovations such as synchronous (i.e. pipelined) DRAM, disk arrays, and fiber optic networks. Therefore we are exploring new ways that the compiler (with varying degrees of help from the hardware and the operating system) can use prefetching and other techniques to intelligently trade off consuming more bandwidth to reduce overall latency. Recent work in this area has included prefetching pointer-based codes, prefetching to hide disk latency in out-of-core numeric applications, and hiding network communication latency in workstation clusters.

Publications

Preprint

Event Tensor: A Unified Abstraction for Compiling Dynamic Megakernel

2026
Jin H, Hou B, Wang G, Lai R, Chen J, Ye Z, Cai Y, Dong Y, Cheng X, Zhang Z, Zhao Y, Huang Y, Yang L, Jiang J, Oliaro G, Ji J, Miao X, Grover V, Mowry TC, Jia Z, Chen T

Journal Article

Partial UDF Inlining

2026 • SIGMOD Record • 55(1):74-83
Arch S, Liu Y, Mowry TC, Pate JM, Pavlo A

Conference

LithOS: An Operating System for Efficient Machine Learning on GPUs

2025 • PROCEEDINGS OF THE 2025 ACM SIGOPS 31ST SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES, SOSP 2025 • 1-17
Coppock PH, Zhang B, Solomon EH, Kypriotis V, Yang L, Sharma B, Schatzberg D, Mowry TC, Skarlatos D

Preprint

LithOS: An Operating System for Efficient Machine Learning on GPUs

2025
Coppock PH, Zhang B, Solomon EH, Kypriotis V, Yang L, Sharma B, Schatzberg D, Mowry TC, Skarlatos D

Preprint

Relax: Composable Abstractions for End-to-End Dynamic Machine Learning

2025
Lai R, Shao J, Feng S, Lyubomirsky SS, Hou B, Lin W, Ye Z, Jin H, Jin Y, Liu J, Jin L, Cai Y, Jiang Z, Wu Y, Park S, Srivastava P, Roesch JG, Mowry TC, Chen T

At a Glance

Academic Offerings

Admissions

Directory Submenu

People

Explore the Field

Todd Mowry

Office

Email

Phone

Department

Administrative Support

Research Areas

Research Interests

Advisees

CSD Courses Taught

Research Statement

Publications

Preprint

Journal Article

Conference

Preprint

Preprint

At a Glance

Academic Offerings

Admissions

Directory Submenu

People

Explore the Field

What can we help you find?

Todd Mowry

Office

Email

Phone

Department

Administrative Support

Research Areas

Research Interests

Advisees

Research Statement

Publications

Preprint

Journal Article

Conference

Preprint

Preprint