Monday, May 10, 2021 - 3:00pm to 4:00pm
Location:Virtual Presentation - ET Remote Access - Zoom
Speaker:GABRIELE FARINA, Ph.D. Student /GABRIELE%20FARINA
Tree-Form Sequential Decision Making: Analytic Foundations, Online Learning, and Game-Theoretic Solution Concepts
Tree-form, sequential decision making (TFSDM) captures tree-form decision processes where the decision maker faces multiple decisions interleaved with observations about the way previous decisions affected the (potentially adversarial) environment. TFSDM provides a powerful and general formalism which captures the decision problem faced by a player of an adversarial imperfect-information extensive-form game (such as poker and other non-recreational strategic interactions), as well as partially-observable Markov decision processes for which the agent conditions its policy on the entire history of observations and actions. By allowing interleaved decisions and observations, TFSDM is significantly more sophisticated than non-sequential decision making, where only one action is planned 'in a vacuum' prescinding from any observation about the state of the system. This sophistication is reflected in the underlying mathematical and computational details. Our understanding of TFSDM making lacks decisively behind that of non-sequential decision making. I have been working towards shedding some light on the theoretical and algorithmic foundations of tree-form sequential decision making, as well as the richness of connections between TFSDM and game theory.
In this proposed thesis, i) we develop several composable online learning and optimization tools that achieve state-of-the-art performance in many settings, including Nash equilibrium computation in adversarial bandit settings; ii) we give state-of-the-art algorithms that enable, for the first time, finding exact trembling-hand refinements of Nash equilibrium in real, large-scale games (including real poker endgames with half a billion terminal states), a recognized open problem; iii) we give efficient, decentralized no-regret learning dynamics for extensive-form correlation in multiplayer games; iv) we study the problem of computing optimal coordinated/collusive behavior in zero-sum adversarial team games, and give state-of-the-art algorithms and positive complexity results for that setting; v) we give fundamental results about the geometry of correlated strategies, identifying new important classes of games where optimal (social-welfare-maximizing) correlated equilibria can be computed in polynomial time, the first positive complexity results for extensive-form correlation in more than a decade; and more.
Tuomas Sandholm (Chair)
Geoffrey J. Gordon
Avrim Blum (TTIC)
Vincent Conitzer (Duke University)
Constantinos Daskalakis (Massachusetts Institute of Technology)
Zoom Participation. See announcement.