This paper presents a novel approach to robotic grasping by integrating embodied visual navigation with reinforcement learning. The primary objective is to determine the optimal location for a robot to stand for successful object grasping. The motivation for this research is to address the existing gap in the literature where navigation and grasping are often treated as separate problems, leading to suboptimal performance. Our approach leverages multimodal sensory data, including RGB images, depth images, and semantic information, to guide the robot's navigation. It also utilizes deep reinforcement learning to enable the robot to learn optimal navigation strategies from visual input. The effectiveness of this approach is demonstrated through a series of experiments conducted in simple and complex scenes with varying numbers of obstacles. The results show that our method achieves a high success rate and a fast grasping speed in different scenarios, outperforming other methods. This work contributes significantly to the field of robotic grasping by integrating embodied visual navigation and deep reinforcement learning, and by demonstrating its effectiveness through rigorous experiments.
Xinzhu LiuDi GuoHuaping LiuFuchun Sun
Yinfeng YuLele CaoFuchun SunChao YangHuicheng LaiWenbing Huang
Xinzhu LiuDi GuoHuaping LiuXinyu ZhangFuchun Sun
Jiaxin LiWen‐Chih HuangZan WangWei LiangHuijun DiFeng Liu
Shuang LiuMasanori SuganumaTakayuki Okatani