Computer Science Thesis Oral

— 12:00pm

Location:
In Person - Reddy Conference Room, Gates Hillman 4405

Speaker:
CHUN KAI LING , Ph.D. Candidate, Computer Science Department, Carnegie Mellon University
https://lingchunkai.github.io/

Scalable Learning and Solving of Extensive-Form Games

Strategic interactions between agents can be formalized by game theory. In recent years modern learning techniques, coupled with the ubiquity of data and proliferation of compute, have paved the way towards computationally solving large games, with broad applications ranging from poker to security.  However, several challenges arise when considering the current state of the art in large-scale game solving: (i) game parameters in many realistic settings are often unspecified; and (ii) there is a notable lack of scalable solvers for large, general-sum extensive-form games (most existing approaches being limited to zero-sum games).

In this thesis, we tackle in these two challenges in two ways.  First, we study the inverse problem in game theory, where game parameters are to be learnt given only samples drawn from its quantal response equilibrium. An end-to-end, differentiable game-solving module fully compatible with modern deep learning architectures is presented. This approach allows for the learning of game parameters in two-player normal-form and extensive-form zero-sum games in a scalable manner via gradient descent. 

Second, we extend state of the art online Nash solvers currently used in zero-sum games to Stackelberg and correlated equilibria in general-sum extensive-form games. In both cases, we present efficient online search algorithms which avoid computation in branches of the game tree not encountered in actual play. These algorithms allow approximate solving of large general-sum games which offline solvers struggle with, while still providing guarantees on performance.

Third, we show how to solve Stackelberg equilibrium in large perfect information general-sum games by learning Enforceable Payoff Functions---functions which capture tradeoffs in utility between players --- rather than the usual scalar values. We demonstrate how to learn these functions using a variant of fitted value iteration, and quantify the suboptimality incurred by errors in function approximation. 

Thesis Committee:

J. Zico Kolter (Co-chair)
Fei Fang (Co-chair)
Tuomas Sandholm
Vincent Conitzer
Michael Bowling (University of Alberta)


Add event to Google
Add event to iCal