5th Year Master's Thesis Presentation - James Kim
April 20, 2026 9:00AM—10:30AM
Location:
In Person
-
Reddy Conference Room, Gates Hillman 4405
Speaker:
JAMES KIM,
Master's Student
Computer Science Department
Carnegie Mellon University
https://www.biojameskim.me/
Recently, Large Reasoning Models (LRMs) have achieved impressive performance on a variety of reasoning tasks, including mathematics and code generation. Their long chains of reasoning enable scaling of inference-time computation, allowing them to solve increasingly complex problems. LRMs are typically post-trained from base models using supervised fine-tuning (SFT), reinforcement learning (RL), or a combination of both. RL is often hypothesized to be a key driver of reasoning ability, as it enables models to explore and discover new solutions. However, recent work suggests that RL may instead concentrate probability mass on existing solutions. The mechanisms by which RL leads to reasoning ability remain poorly understood. In this thesis, we study RL training mechanisms through the lens of compositional generalization—a key sub-skill of reasoning that involves combining atomic skills to solve more complex problems. We find that RL-trained models substantially outperform those trained with standard SFT on this task. To isolate the effects of RL, we decompose it into three components: on-policy data, the use of negative samples, and objective design. By ablating each component, we establish a progression from SFT to RL and identify their respective contributions. Empirically, we find that both on-policy data and negative samples are critical for the emergence of compositional generalization, while objective design choices (e.g., group normalization) have a relatively small impact. Our findings suggest that effective post-training of LLMs requires understanding and carefully designing the individual components of the training pipeline, rather than treating RL as a monolithic improvement.
Thesis Committee
Chenyan Xiong (Chair)
Aditi Raghunathan
Additional Information
For More Information:
amalloy@cs.cmu.edu