Online Tuning for Offline Decentralized Multi-Agent Reinforcement Learning

Jiechuan Jiang; Zongqing Lu

doi:10.1609/aaai.v37i7.25973

ScienceGate Book Chapters

JOURNAL ARTICLE

Online Tuning for Offline Decentralized Multi-Agent Reinforcement Learning

Jiechuan Jiang Zongqing Lu

Year: 2023 Journal: Proceedings of the AAAI Conference on Artificial Intelligence Vol: 37 (7)Pages: 8050-8059 Publisher: Association for the Advancement of Artificial Intelligence

DOI: 10.1609/aaai.v37i7.25973

Get Full-Text PDF Get Analytical Report

Abstract

Offline reinforcement learning could learn effective policies from a fixed dataset, which is promising for real-world applications. However, in offline decentralized multi-agent reinforcement learning, due to the discrepancy between the behavior policy and learned policy, the transition dynamics in offline experiences do not accord with the transition dynamics in online execution, which creates severe errors in value estimates, leading to uncoordinated low-performing policies. One way to overcome this problem is to bridge offline training and online tuning. However, considering both deployment efficiency and sample efficiency, we could only collect very limited online experiences, making it insufficient to use merely online data for updating the agent policy. To utilize both offline and online experiences to tune the policies of agents, we introduce online transition correction (OTC) to implicitly correct the offline transition dynamics by modifying sampling probabilities. We design two types of distances, i.e., embedding-based and value-based distance, to measure the similarity between transitions, and further propose an adaptive rank-based prioritization to sample transitions according to the transition similarity. OTC is simple yet effective to increase data efficiency and improve agent policies in online tuning. Empirically, OTC outperforms baselines in a variety of tasks.

Keywords:

Reinforcement learning Computer science Offline learning Online and offline Embedding Rank (graph theory) Similarity (geometry) Sample (material) Variety (cybernetics) Software deployment Online algorithm Artificial intelligence Machine learning Online learning Data mining Algorithm World Wide Web Mathematics

Metrics

Cited By

0.43

FWCI (Field Weighted Citation Impact)

Refs

0.52

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Reinforcement Learning in Robotics

Physical Sciences → Computer Science → Artificial Intelligence

Smart Grid Energy Management

Physical Sciences → Engineering → Electrical and Electronic Engineering

Data Stream Mining Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Online Tuning for Offline Decentralized Multi-Agent Reinforcement Learning

Abstract

Metrics

Citation History

Topics

Related Documents

Offline Decentralized Multi-Agent Reinforcement Learning

Offline Decentralized Multi-Agent Reinforcement Learning

Offline Decentralized Multi-Agent Reinforcement Learning

Decentralized Deterministic Multi-Agent Reinforcement Learning

Decentralized and partially decentralized multi-agent reinforcement learning