Embodied visual navigation is an important task that the agent learns to navigate to a specific target object based on egocentric visual observations, by performing specific actions in the environment. However, there exists a problem of mismatch between the training and testing action spaces through learning methods, and methods used to solve this problem have been scarcely developed. In this paper, we propose a novel problem of the action-insensitive embodied visual navigation task with different action spaces of the agent between the training and testing process. A robust adversary learning framework is built to learn a general and robust policy that can adapt properly to different action spaces. The proposed model in the first-stage adversary training learns a robust feature representation of the agent's states and transfers the trained strategy to new action spaces with fewer training samples in the second-stage adaptation training. Experiments on 3D indoor scenes validate the effectiveness of the proposed approach.
Xingchen WangYan DingBeichen ShaoFei MengChao Chen
Yinfeng YuLele CaoFuchun SunChao YangHuicheng LaiWenbing Huang
Xinzhu LiuDi GuoHuaping LiuXinyu ZhangFuchun Sun
Jiaxin LiWen‐Chih HuangZan WangWei LiangHuijun DiFeng Liu
Shuang LiuMasanori SuganumaTakayuki Okatani