SCS Faculty Candidate

May 2, 2024 10:00am — 12:00pm

Location:
In Person and Virtual - ET - Newell-Simon 4305 and Zoom

Speaker:
TIM DETTMERS , Ph.D. Candidate, Paul G. Allen School of Computer Science & Engineering, University of Washington
https://timdettmers.com/

Accessible Foundation Models: Systems, Algorithms, and Science

The ever-increasing scale of foundation models, such as ChatGPT and AlphaFold, has revolutionized AI and science more generally. However, increasing scale also steadily raises computational barriers, blocking almost everyone from studying, adapting, or otherwise using these models for anything beyond static API queries.

In this talk, I will present research that significantly lowers these barriers for a wide range of use cases, including inference algorithms that are used to make predictions after training, finetuning approaches that adapt a trained model to new data, and finally, full training of foundation models from scratch.

For inference, I will describe our LLM.int8() algorithm, which showed how to enable high-precision 8-bit matrix multiplication that is both fast and memory efficient. LLM.int8() is based on the discovery and characterization of sparse outlier sub-networks that only emerge at large model scales but are crucial for effective Int8 quantization.

For finetuning, I will introduce the QLoRA algorithm, which pushes such quantization much further to unlock finetuning of very large models on a single GPU by only updating a small set of the parameters while keeping most of the network in a new information-theoretically optimal 4-bit representation.

For full training, I will present SWARM parallelism, which allows collaborative training of foundation models across continents on standard internet infrastructure while still being 80% as effective as the prohibitively expensive supercomputers that are currently used.

Finally, I will close by outlining my plans to make foundation models 100x more accessible, which will be needed to maintain truly open AI-based scientific innovation as models continue to scale.

—

Tim Dettmers’s research focuses on making foundation models, such as ChatGPT, accessible to researchers and practitioners by reducing their resource requirements. This involves developing novel compression and networking algorithms and building systems that allow for memory-efficient, fast, and cheap deep learning. These methods enable many more people to use, adapt, or train foundation models without affecting the quality of AI predictions or generations. He is a PhD candidate at the University of Washington and has won oral, spotlight, and best paper awards at conferences such as ICLR and NeurIPS. He created the bitsandbytes library for efficient deep learning, which is growing at 1.7 million installations per month and received Google Open Source and PyTorch Foundation awards.

Faculty Host:

Zico Kolter
Ameet Talwalkar

In Person and Zoom Participation. See announcement.

Add event to Google
Add event to iCal

At a Glance

Academic Offerings

Admissions

Directory Submenu

People

Explore the Field

SCS Faculty Candidate

May 2, 2024 10:00am — 12:00pm

At a Glance

Academic Offerings

Admissions

Directory Submenu

People

Explore the Field

What can we help you find?

SCS Faculty Candidate

May 2, 2024 10:00am — 12:00pm