Towards Practical Offline Reinforcement Learning: Sample Efficient Policy Selection and Evaluation

Liu, Vincent

doi:10.7939/r3-vre2-b756

ScienceGate Book Chapters

DISSERTATION

Towards Practical Offline Reinforcement Learning: Sample Efficient Policy Selection and Evaluation

Liu, Vincent

Year: 2024 University: University of Alberta Library

DOI: 10.7939/r3-vre2-b756

Get Full-Text PDF Get Analytical Report

Abstract

Offline reinforcement learning (RL) involves learning policies from datasets, rather than online interaction. The dissertation first investigates a critical component in offline RL: offline policy selection (OPS). Given that most offline RL algorithms require careful hyperparameter tuning, we need to select the best policy amongst a set of candidate policies before deployment. In the first part of the dissertation, we provide clarity on when OPS is sample efficient by building a clear connection to off-policy policy evaluation (OPE) and Bellman error estimation. This dissertation then presents algorithms to leverage offline data. We begin by examining environments that include exogenous variables with limited agent impact and endogenous variables under full agent control. We show that policy evaluation and selection become straightforward under such conditions. Additionally, we present an algorithm based on Fitted-Q Iteration with data augmentation and show its ability to find nearly optimal policies with polynomial sample complexity. We then study OPE in non-stationary environments and introduce the regression-assisted doubly robust estimator, which effectively incorporates the past data without introducing a large bias and improves on existing OPE estimators with the use of auxiliary information and a regression approach. We evaluate our algorithms across a variety of problems, some built using real-world datasets, including optimal order execution, inventory management, hybrid car control and recommendation systems.

Keywords:

Reinforcement learning Leverage (statistics) Estimator Selection (genetic algorithm) Variety (cybernetics) Set (abstract data type) Hyperparameter Sample (material)

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Reinforcement Learning in Robotics

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Bandit Algorithms Research

Social Sciences → Decision Sciences → Management Science and Operations Research

Adaptive Dynamic Programming Control

Physical Sciences → Computer Science → Computational Theory and Mathematics

Towards Practical Offline Reinforcement Learning: Sample Efficient Policy Selection and Evaluation

Abstract

Metrics

Topics

Related Documents

Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning

Sample Efficient Offline-to-Online Reinforcement Learning

Quantum Natural Policy Gradients: Towards Sample-Efficient Reinforcement Learning

Towards Sample Efficient Reinforcement Learning

Federated Offline Reinforcement Learning with Proximal Policy Evaluation