Boyin LiuZhiqiang PuYi PanJianqiang YiMin ChenShijie Wang
In multi-agent reinforcement learning (MARL), agents must learn to cooperate by observing the environment and selecting actions that maximize their rewards. However, this learning process can be hampered by myopia, wherein agents' strategies fail to consider the long-term consequences of their actions. A primary reason for this problem is the inaccurate estimation of the long-term value of each action. Socially, humans derive future expectation cognition from available information to anticipate potential future outcomes and adjust their actions accordingly to avoid myopia. Motivated by these insights, this paper proposes a novel framework called QFuture to address the myopia problem. Specifically, we first design a future expectation cognition module (FECM) in this framework to build future expectation cognition in the calculation of individual actionvalue (IAV) and joint action-value (JAV). We model future expectation cognition as random variables in FECM, which learn representation by maximizing mutual information with the future trajectory based on current information. Furthermore, a return-based regularizer is designed to reflect "expectation" and ensure informativeness in the future expectation representation module (FERM) which encodes the future trajectory. Experiments on StarCraft II micromanagement tasks and Google Research Football show that QFuture achieves significant state-of-the-art performance. Demonstrative videos are available at https://sites.google.com/view/qfuture .
Tenghai QiuShiguang WuZhen LiuZhiqiang PuJianqiang YiYuqian ZhaoBiao Luo
Jonathan P. HowDong Ki KimSamir Wadhwania
Jonathan P. HowDong Ki KimSamir Wadhwania
Weiheng DaiChengyang HeGuillaume Sartoretti