SCS Faculty Candidate

— 12:00pm

In Person and Virtual - ET - Newell-Simon 4305 and Zoom

LIANMIN ZHENG, Ph.D. Student, Department of Electrical Engineering and Computer Sciences , University of California, Berkeley

Scalable and Efficient Systems for Large Language Models

Large Language Models (LLMs) have been driving recent breakthroughs in AI. These advancements would not have been possible without the support of scalable and efficient infrastructure systems. In this talk, I will introduce several systems I have designed and built to support the entire model lifecycle, from training to deployment to evaluation. 

First, I will present Alpa, a system for scalable model-parallel training that automatically generates execution plans unifying inter- and intra-operator parallelism. Next, I will discuss SGLang, an efficient deployment system covering both the frontend programming interface and backend runtime optimizations for high-performance inference. 

Finally, I will complete the model lifecycle by presenting our model evaluation efforts, including the crowdsourced live benchmark platform, Chatbot Arena, and the automatic evaluation pipeline, LLM-as-a-Judge. These projects have collectively laid a solid foundation for large language model systems, being widely adopted by leading LLM developers and companies. 

I will conclude by outlining some future directions, such as a programmatic and composable software stack for using LLMs and further improvements with synthetic data. 

Lianmin Zheng is a Ph.D. student in the EECS department at UC Berkeley, advised by Ion Stoica and Joseph E. Gonzalez. His research interests include machine learning systems, large language models, compilers, and distributed systems. He builds full-stack, scalable, and efficient systems to advance the development of AI. He co-founded, where he leads impactful open-source large language model projects such as Vicuna and Chatbot Arena, which have received millions of downloads and served millions of users. He has received a Meta Ph.D. Fellowship, an IEEE Micro Best Paper Award, and an a16z open-source AI grant. 

Faculty Host: Tianqi Chen 

In Person and Zoom Participation.  See announcement.

Add event to Google
Add event to iCal