Doctoral Thesis Oral Defense - Mingjie Sun

— 11:00am

Location:
In Person and Virtual - ET - Reddy Conference Room, Gates Hillman 4405 and Zoom

Speaker:
MINGJIE SUN , Ph.D. Candidate, Computer Science Department, Carnegie Mellon University
https://eric-mingjie.github.io/

Hidden Properties of Large Language Models

Large Language Models (LLMs) are deep learning models trained to understand and generate natural language. Over the course of my PhD, these models have profoundly transformed the field of machine learning. Despite their remarkable success, most of our interactions with LLMs remain largely black-box, leaving key questions about their internal mechanisms and behaviors under-explored.

This thesis investigates previously overlooked hidden properties of LLMs across three dimensions: internal weight structure, activation patterns, and output behaviors. First, we demonstrate that the weight space of LLMs is intrinsically sparse and present a principled pruning approach capable of extracting efficient sparse subnetworks directly from pre-trained models. Next, we reveal the existence of structured activation outliers in LLMs, which we call "massive activations". These activations, despite their rarity, are exceptionally high in their magnitudes. We establish their strong connection to the self-attention mechanism and propose a novel attention formulation that mitigates these extreme outliers. Finally, we characterize the idiosyncrasies of LLM outputs, showing that generations from different models can be distinguished with remarkably high accuracies. We further identify the specific signatures that underlie these differences. Collectively, these findings provide an alternative perspective on modern foundation models. 

Thesis Committee

J. Zico Kolter (Chair)
Graham Neubig
Aditi Raghunathan
Kaiming He (Massachusetts Institute of Technology)

In Person and Zoom Participation.  See announcement.


Add event to Google
Add event to iCal