Abstract

In this paper, a novel distributed on-policy Actor-Critic algorithm for multiagent reinforcement learning is proposed.The algorithm consists of the temporal difference scheme with function approximation at the Critic stage, and a policy gradient algorithm at the Actor stage, derived starting from a global objective.At both stages, decentralized agreement among the agents is achieved using the linear dynamic consensus strategy.Compared to the existing schemes, the algorithm has improved convergence rate and noise immunity, and a possibility to achieve multi-task global optimization.

Keywords:
Reinforcement learning Computer science Artificial intelligence Human–computer interaction

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
39
Refs
0.07
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Reinforcement Learning in Robotics
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.