Doctoral Thesis Oral Defense - Mingjie Sun May 20, 2025 9:00am — 11:00am Location: In Person and Virtual - ET - Reddy Conference Room, Gates Hillman 4405 and Zoom Speaker: MINGJIE SUN , Ph.D. Candidate, Computer Science Department, Carnegie Mellon University https://eric-mingjie.github.io/ Hidden Properties of Large Language Models Large Language Models (LLMs) are deep learning models trained to understand and generate natural language. Over the course of my PhD, these models have profoundly transformed the field of machine learning. Despite their remarkable success, most of our interactions with LLMs remain largely black-box, leaving key questions about their internal mechanisms and behaviors under-explored.This thesis investigates previously overlooked hidden properties of LLMs across three dimensions: internal weight structure, activation patterns, and output behaviors. First, we demonstrate that the weight space of LLMs is intrinsically sparse and present a principled pruning approach capable of extracting efficient sparse subnetworks directly from pre-trained models. Next, we reveal the existence of structured activation outliers in LLMs, which we call "massive activations". These activations, despite their rarity, are exceptionally high in their magnitudes. We establish their strong connection to the self-attention mechanism and propose a novel attention formulation that mitigates these extreme outliers. Finally, we characterize the idiosyncrasies of LLM outputs, showing that generations from different models can be distinguished with remarkably high accuracies. We further identify the specific signatures that underlie these differences. Collectively, these findings provide an alternative perspective on modern foundation models. Thesis CommitteeJ. Zico Kolter (Chair)Graham NeubigAditi RaghunathanKaiming He (Massachusetts Institute of Technology)In Person and Zoom Participation. See announcement. Add event to Google Add event to iCal