In area of the reinforcement learning, an environment is important because when a well-known reinforcement learning technique for an environment is applied to another environment, it does not guarantee whether the technique also works well or not. To apply reinforcement learning techniques to real world environments, a partial observation condition is one of important issues. In addition, a communication is also important for a multi-agent setting under the partial observation condition. In this paper, a periodic communication for distributed multi-agent reinforcement learning is studied. Via the periodic communication, each agent is able to share an auxiliary observation which is a compressed version of an observation. After sharing auxiliary observations, each agent makes a decision based on own observation and other's shared observation. However, due to the periodicity of a communication, an absence of sharing auxiliary observation occurs for a non-communication phase. Thus, it is necessary to consider how to compensate the absence of shared auxiliary observations within non-communication phase. To this end, several methods are proposed to predict auxiliary observations of other agents. In simulation results, it is shown that an affection of the partial observation condition and performances according to the methods of compensation of auxiliary observations.
Weichao MaoKaiqing ZhangErik MiehlingTamer Başar
Ning YangHaijun ZhangRandall Berry