Siyu HuangBin HuRuiquan LiaoJiang‐Wen XiaoDingxin HeZhi‐Hong Guan
This paper studies the multi-agent pursuit-evasion problem. When the mathematical model of agent is unknown, it's effective to use machine learning algorithm to design the policy of each agent. According to the cooperation among pursuers and competition between evaders and pursuers, we choose deterministic policy gradient of reinforcement learning as our basic approach. In this study, we redesign the reward function and the structure of neural network to adapt to the actual environment where evader has greater speed and accelerated speed than pursuers. The character of this algorithm is that it only takes coordinates of agents as controller input without other information like speed, in particular, this algorithm would keep effective even the environment transform to higher dimensional space. Finally, we verify the validity of our algorithm in experiment.
Astrid VannesteWesley Van WijnsbergheSimon VannesteKevin MetsSiegfried MercelisSteven LatréPeter Hellinckx
Maryam KiaJeffrey A. CramerArtur Luczak