Special Artificial Intelligence Seminar

— 4:00pm

Location:
In Person and Virtual - ET - ASA Conference Room, Gates Hillman 6115 and Zoom

Speaker:
SADHIKA MALLADI, Ph.D. Student, Department of Computer Science, Princeton University
https://www.cs.princeton.edu/~smalladi/


Theory and Practice in Language Model Fine-Tuning

Fine-tuning ever larger and more capable language models (LMs) has proven to be an effective way to solve a variety of language related tasks. Yet little is understood about what fine-tuning does, and most traditional optimization analyses cannot account for a pre-trained initialization. 

I will start by formalizing the common intuition that fine-tuning makes a small change to the model. Inspired by the neural tangent kernel (NTK), we propose an empirically validated and theoretically sound hypothesis that can approach answering questions like "Why doesn't a giant LM overfit when fine-tuning it on a few dozen examples?" and "Why does LoRA work?" Our simple mental model motivates an efficient, transferable, and optimizer-aware data selection algorithm, dubbed LESS, to elicit specific capabilities during instruction tuning. Using LESS to select 5% of the data outperforms on the full dataset, and we can also use a small model to select data for other models. 

Finally, I will describe how insights into the dynamics of fine-tuning inspired us to design a memory-efficient zeroth-order algorithm (MeZO) that can tune large LMs. MeZO frequently matches performance while using up to 12x less memory and half as many GPU-hours as standard fine-tuning. These works were done in collaboration with researchers at Princeton University and University of Washington. 

— 

Sadhika Malladi is a PhD student at Princeton University advised by Sanjeev Arora. She has worked at OpenAI, Cerebras, and Microsoft Research. She graduated from MIT in 2019 with a degree in mathematics and computer science and a degree in philosophy. Her work focuses on the interplay between theory and empirics, especially with respect to language models. The 

AI Seminar is sponsored by SambaNova Systems 

In Person and Zoom Participation.  See announcement.

Event Website:
http://www.cs.cmu.edu/~aiseminar/