Reinforcement Learning with Reward Machines in Stochastic Games

Jueming Hu; Jean-Raphaël Gaglione; Yanze Wang; Zhe Xu; Ufuk Topcu; Yongming Liu

doi:10.3233/faia230380

ScienceGate Book Chapters

BOOK-CHAPTER

Reinforcement Learning with Reward Machines in Stochastic Games

Jueming Hu Jean-Raphaël Gaglione Yanze Wang Zhe Xu Ufuk Topcu Yongming Liu

Year: 2023 Frontiers in artificial intelligence and applications

DOI: 10.3233/faia230380

Get Full-Text PDF Get Analytical Report

Abstract

We investigate multi-agent reinforcement learning for stochastic games with complex tasks, where the reward functions are non-Markovian. We utilize reward machines to incorporate high-level knowledge of complex tasks. We develop an algorithm called Q-learning with reward machines for stochastic games (QRM-SG), to learn the best-response strategy at Nash equilibrium for each agent. In QRM-SG, we define the Q-function at a Nash equilibrium in augmented state space. The augmented state space integrates the state of the stochastic game and the state of reward machines. Each agent learns the Q-functions of all agents in the system. We prove that Q-functions learned in QRM-SG converge to the Q-functions at a Nash equilibrium if the stage game at each time step during learning has a global optimum point or a saddle point, and the agents update Q-functions based on the best-response strategy at this point. We use the Lemke-Howson method to derive the best-response strategy given current Q-functions. The three case studies show that QRM-SG can learn the best-response strategies effectively. QRM-SG learns the best-response strategies after around 7500 episodes in Case Study I, 1000 episodes in Case Study II, and 1500 episodes in Case Study III, while baseline methods such as Nash Q-learning and MADDPG fail to converge to the Nash equilibrium in all three case studies.

Keywords:

Best response Reinforcement learning Nash equilibrium State space Computer science Q-learning Mathematical optimization Correlated equilibrium Epsilon-equilibrium Markov decision process Game theory Artificial intelligence Repeated game Mathematical economics Markov process Mathematics Equilibrium selection

Metrics

Cited By

1.30

FWCI (Field Weighted Citation Impact)

Refs

0.80

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Reinforcement Learning in Robotics

Physical Sciences → Computer Science → Artificial Intelligence

Reinforcement Learning with Reward Machines in Stochastic Games

Abstract

Metrics

Citation History

Topics

Related Documents

Reinforcement Learning with Stochastic Reward Machines

Reinforcement learning with predefined and inferred reward machines in stochastic games

Concurrent Multiagent Reinforcement Learning with Reward Machines

Counterfactually-Guided Causal Reinforcement Learning with Reward Machines

Pushdown Reward Machines for Reinforcement Learning