At present, the reinforcement learning algorithm based on deep neural network has made great progress in the research of multi-agent interactive training methods. However, the training cost of the model increases significantly with the increase of network depth and the number of agents. Therefore, designing an efficient method about multi-agent reinforcement learning is the focus of current research. In this paper, the strategy network structure is the PPO which is the traditional reinforcement learning strategy gradient algorithm. The training data is characterized by knowledge graph and key information is extracted to ensure that the training can be completed by obtaining a small amount of data from the multi-agent interactive data. So we can effectively reduce the number of training iterations and improve the efficiency of convergence. This paper is based on knowledge graph and PPO algorithm to solve the multi-agent reinforcement learning problem. The research results can be applied in the field of unmanned cluster intelligent decision.
Luyi BaiMingzhuo ChenQianwen Xiao
ZHOU Jiawei, SUN Yuxiang, XUE Yufan, XIANG Qi, WU Ying, ZHOU Xianzhong
Haoqiang ChenYadong LiuZongtan ZhouDewen HuMing Zhang
Zixuan LiXiaolong JinSaiping GuanYuanzhuo WangXueqi Cheng