Multi-goal reinforcement learning (RL) agent aims at achieving and generalizing over various goals. Due to the sparsity of goal-reaching rewards, it suffers from unreliable value estimation and is thus unable to efficiently identify essential states towards specific goal-reaching. To deal with the problem, we propose Exploring Successor Matching (ESM), a framework that enables goal-conditioned policy and progressively encourages the multi-goal exploration towards the promising frontier. ESM adopts the idea of successor feature and extends it to goal-reaching successor mapping that serves as a more stable state feature under sparse rewards. After acquiring the successor mapping, it further explores intrinsic goals that are more likely to be achieved from a diverse set of states in terms of future state occupancies. Experiments on challenging manipulation tasks show that ESM deals well with sparse rewards and achieves better sample efficiency.
Christopher HoangSungryull SohnJongwook ChoiWilka CarvalhoHonglak Lee
Raymond ChuaBlake RichardsDoina PrecupChristos Kaplanis
Chenjia BaiPeng LiuWei ZhaoXianglong Tang