CMU Flame Center Seminar - Jacob Springer

— 2:00pm

Location:
In Person and VIrtual - ET - Tepper Building 1403 and Zoom

Speaker:
JACOB SPRING , Ph.D. Student, Machine Learning Department, Carnegie Mellon University
https://sprin.xyz/

Overtrained Language Models Are Harder to Fine-Tune

Large language models are pre-trained on ever-growing token budgets under the assumption that better pre-training performance translates to improved downstream models. In this work, we challenge this assumption and show that extended pre-training can make models harder to fine-tune, leading to degraded final performance. We term this phenomenon catastrophic overtraining. For example, the instruction-tuned OLMo-1B model pre-trained on 3T tokens leads to over 2% worse performance on multiple standard LLM benchmarks than its 2.3T token counterpart. Through controlled experiments and theoretical analysis, we show that catastrophic overtraining arises from a systematic increase in the broad sensitivity of pre-trained parameters to modifications, including but not limited to fine-tuning. Our findings call for a critical reassessment of pre-training design that considers the downstream adaptability of the model.  

Paper Reference

 — 

Jacob Springer is a third-year PhD student in the Machine Learning Department at CMU, advised by Aditi Raghunathan. His research broadly focuses on the science of foundation models, emphasizing pre-training, fine-tuning, and optimization. His current research investigates factors affecting the adaptability of language models to new tasks—through fine-tuning or prompting—across all stages of the model lifecycle, from pre-training to inference. 

In Person and Zoom Participation.  See announcement.

Event Website:
https://www.cmu.edu/flame/events/index.html


Add event to Google
Add event to iCal