Computer Science Thesis Oral

Friday, January 28, 2022 - 10:00am to 12:00pm


Virtual Presentation - ET Remote Access - Zoom


RUOSONG WANG, Ph.D. StudentComputer Science DepartmentCarnegie Mellon University

Tackling Challenges in Modern Reinforcement Learning: Long Planning Horizon and Large State Space

Modern reinforcement learning (RL) methods have achieved phenomenal success in various applications. However, reinforcement learning problems with large state space and long planning horizon remain challenging due to the excessive sample complexity burden, and our current theoretical understanding is rather limited for such problems. Moreover, there are important settings (including unsupervised RL and general objective functions) that cannot be addressed by the classical RL framework. In this thesis, we study the above issues to build a better understanding of modern RL methods. This thesis is divided into the following three parts: Part I: RL with Long Planning Horizon. Learning to plan for long horizons is a central challenge in reinforcement learning problems, and a fundamental question is to understand how the difficulty of the problem scales as the horizon increases. In the first part of this thesis, we show that tabular episodic reinforcement learning is possible with a sample complexity that is completely independent of the planning horizon, and therefore, long horizon RL is no more difficult than short horizon RL, at least in a minimax sense. Part II: RL with Large State Space. In modern RL methods, function approximation schemes are deployed to deal with the large state space. Empirically, combining various RL function approximation algorithms with neural networks for feature extraction has led to tremendous successes on various tasks. However, these methods often require a large number of samples to learn a good policy, and it is unclear if there are fundamental statistical limits on such methods. In the second part of this thesis, we study necessary and sufficient conditions on the representation power of the features that permit sample-efficient reinforcement learning through theoretical analysis and experiments. Part III: RL in Other Settings. Classical reinforcement learning paradigms aim to maximize the cumulative reward when the agent has access to the reward values. Despite being able to formulate a large family of reinforcement learning problems, there are important applications that cannot be cast into this framework. In the third part of this thesis, we study two new settings, the reward-free setting, and RL with general objective functions, that generalize the classical RL framework. Thesis Committee: Ruslan Salakhutdinov (Chair) Zico Kolter Aarti Singh Sham M. Kakade (Harvard University) Zoom Participation. See announcement.  

For More Information, Contact:


Thesis Oral