Guaranteed Policy Performance in Reinforcement Learning
Author: Voloshin, Cameron
Year: 2024
Degree: Dissertation (Ph.D.)
Advisor: Yue, Yisong
Committee Members: Wierman, Adam C.; Yue, Yisong; Bouman, Katherine L.; Chaudhuri, Swarat
Option: Computing and Mathematical Sciences
DOI: 10.7907/n2fg-e554
Abstract
Decision-making is ubiquitous in everyday life. Increasingly, researchers are seeking answers on how to optimally solve sequential decision-making tasks. Thanks to recent availability of computation, advances in deep learning, and released open-sourced code, it has become easy to train a computational agent to make decisions in many domains. Nevertheless, in realistic scenarios where the consequences of failure are high, running a trained computational agent in the wild poses substantial risk.
The goal of this thesis is to develop and advance techniques that guarantee a learned agent does what we expect it to do. The thesis tackles two central questions:
1) Given an agent, how can we predict if it will perform desirably?
2) Can we structure the learning process to guarantee desirable post-learning performance?
On the former question, this thesis proposes multiple algorithms to evaluate such agents, finds factors that have high influence on the success of agent evaluation, and open-sources benchmarks for further development in the space.
On the latter question, this thesis formulates desirable agent behavior as a constrained optimization with varying types of constraints depending on the structure afforded to the practitioner. Constraining the search space over the learning process ensures post-learning behaviors will, by definition, perform as desired.
Files
- Cameron_Voloshin_2024_Thesis.pdf (application/pdf)