Belief Reward Shaping in Reinforcement Learning

Ofir Marom; Benjamin Rosman

doi:10.1609/aaai.v32i1.11741

ScienceGate Book Chapters

JOURNAL ARTICLE

Belief Reward Shaping in Reinforcement Learning

Ofir Marom Benjamin Rosman

Year: 2018 Journal: Proceedings of the AAAI Conference on Artificial Intelligence Vol: 32 (1) Publisher: Association for the Advancement of Artificial Intelligence

DOI: 10.1609/aaai.v32i1.11741

Get Full-Text PDF Get Analytical Report

Abstract

A key challenge in many reinforcement learning problems is delayed rewards, which can significantly slow down learning. Although reward shaping has previously been introduced to accelerate learning by bootstrapping an agent with additional information, this can lead to problems with convergence. We present a novel Bayesian reward shaping framework that augments the reward distribution with prior beliefs that decay with experience. Formally, we prove that under suitable conditions a Markov decision process augmented with our framework is consistent with the optimal policy of the original MDP when using the Q-learning algorithm. However, in general our method integrates seamlessly with any reinforcement learning algorithm that learns a value or action-value function through experience. Experiments are run on a gridworld and a more complex backgammon domain that show that we can learn tasks significantly faster when we specify intuitive priors on the reward distribution.

Keywords:

Reinforcement learning Markov decision process Computer science Artificial intelligence Bootstrapping (finance) Prior probability Bellman equation Convergence (economics) Machine learning Temporal difference learning Bayesian probability Process (computing) Markov process Mathematical optimization Mathematics Econometrics

Metrics

Cited By

3.08

FWCI (Field Weighted Citation Impact)

Refs

0.92

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Reinforcement Learning in Robotics

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Bandit Algorithms Research

Social Sciences → Decision Sciences → Management Science and Operations Research

Adversarial Robustness in Machine Learning

Physical Sciences → Computer Science → Artificial Intelligence

Belief Reward Shaping in Reinforcement Learning

Abstract

Metrics

Citation History

Topics

Related Documents

Multigrid Reinforcement Learning with Reward Shaping

Reward Shaping in Episodic Reinforcement Learning

Reward Shaping Based Federated Reinforcement Learning

Plan-based reward shaping for reinforcement learning

Hindsight Reward Shaping in Deep Reinforcement Learning