Doctoral Thesis Proposal - Zhihao Zhang

— 11:00am

Location:
6501 - Gates & Hillman Centers

Speaker:
ZHIHAO ZHANG , Ph.D. Student, Computer Science Department, Carnegie Mellon University

A Path Towards Efficient Large Language Model Deployment through Algorithm and System Co-Design

Recent advancements in large language models have shown promising results in diverse downstream tasks by training and test time scaling. However, the fast-paced development of large models has posed significant challenges to their energy cost and efficient deployment. 

To achieve this, my thesis topic is centered around bridging the gap between the algorithm-system co-design space for better large model deployment. More specifically, through:

  1. hardware-guided algorithmic explorations for efficient large language model inference, and 
  2. LLM inference-specific system optimizations to fully exploit hardware utilization. 

For algorithmic improvements, I will present two lines of research projects on Speculative Decoding (SpecInfer, RaLMSpec) and Sparse Attention (TidalDecode, LessisMore).

For system optimizations, I will present one project on LLM deployment with MegaKernel (MPK) and one ongoing project that is focusing on generalizing the megakernel runtime to support multi-LLM deployment. 

Benefiting from the algorithm-system co-optimizations, the proposed thesis topic is expected to provide an effective solution for reducing the energy cost and improving the efficiency of LLM deployment in the real world.

Thesis Committee

Zhihao Jia (Chair)
Tianqi Chen
Dimitrios Skarlatos
Ravi Netravali (Princeton University)

Additional Information

For More Information:
Matt Stewart


Add event to Google
Add event to iCal