5th Year Master's Thesis Presentation - James Kim

April 20, 2026  9:00AM—10:30AM

Location:
In Person - Reddy Conference Room, Gates Hillman 4405

Speaker:
JAMES KIM, Master's Student
Computer Science Department
Carnegie Mellon University

https://www.biojameskim.me/

Dissecting Reinforcement Learning: Mechanisms Behind Compositional Reasoning in LLMs

Recently, Large Reasoning Models (LRMs) have achieved impressive performance on a variety of reasoning tasks, including mathematics and code generation. Their long chains of reasoning enable scaling of inference-time computation, allowing them to solve increasingly complex problems. LRMs are typically post-trained from base models using supervised fine-tuning (SFT), reinforcement learning (RL), or a combination of both. RL is often hypothesized to be a key driver of reasoning ability, as it enables models to explore and discover new solutions. However, recent work suggests that RL may instead concentrate probability mass on existing solutions. The mechanisms by which RL leads to reasoning ability remain poorly understood. In this thesis, we study RL training mechanisms through the lens of compositional generalization—a key sub-skill of reasoning that involves combining atomic skills to solve more complex problems. We find that RL-trained models substantially outperform those trained with standard SFT on this task. To isolate the effects of RL, we decompose it into three components: on-policy data, the use of negative samples, and objective design. By ablating each component, we establish a progression from SFT to RL and identify their respective contributions. Empirically, we find that both on-policy data and negative samples are critical for the emergence of compositional generalization, while objective design choices (e.g., group normalization) have a relatively small impact. Our findings suggest that effective post-training of LLMs requires understanding and carefully designing the individual components of the training pipeline, rather than treating RL as a monolithic improvement.

Thesis Committee
 Chenyan Xiong (Chair)
Aditi Raghunathan

Additional Information 

For More Information:
amalloy@cs.cmu.edu


Add event to Google
Add event to iCal