Doctoral Thesis Proposal - Yonghao Zhuang

— 2:00pm

Location:
In Person and Virtual - ET - Newell-Simon 4201 and Zoom

Speaker:
YONGHAO ZHUANG, Ph.D. Student
Computer Science Department
Carnegie Mellon University

https://zyhowell.github.io/

On Efficient Language Model Post Training with Attention Disaggregation

Today's LLM training introduces more stages to further improve the model quality, and post-training is the most important. Despite the long-context workload imbalance, post-training includes reinforcement learning (RL), which iteratively runs the "rollout generation - reward evaluation - policy update'' pipeline.

This thesis proposes the concept of attention server, where the main part of attention (core attention) is disaggregated from other components of the model, and is handled by an independent cluster of GPUs. The first benefit of the disaggregation is independent scaling, enabling a higher batch size of other components; Besides, the core attention kernel only needs a subset of the GPU resources to saturate its memory bandwidth demand, allowing the GPU to utilize the remaining resources for compute intensive tasks.

Thesis Committee
Eric Xing (Chair)
Tianqi Chen
Zhihao Jia
Ion Stoica (University of California, Berkeley)
Hao Zhang (University of California, San Diego)

Additional Information

In Person and Zoom Participation.  See announcement. 

For More Information:
matthewstewart@cmu.edu


Add event to Google
Add event to iCal