Xu YangJunqi CaiKunbo LiXiumin Fan
Convolutional neural networks have shown excellent potential on establishing correspondences from 2D images to 3D objects for object 6D pose estimation, both for dense and sparse methods. However, only single geometric representation between each object pixel and keypoint is utilized in existing sparse methods. In this work, we attempt to explore more accurate keypoint predictions with multiple geometric representations in the sparse method. First, we utilize the convolutional neural network to regress the pixel-wise offset vector field, and convert offset vector field into multiple geometric representations with directions and distances. Then we propose a coarse-to-fine keypoint prediction pipeline, using multiple geometric representations and a sliding window to calculate more accurate 2D keypoint hypotheses. Finally, by matching the 2D-3D correspondences through sparse keypoints and using the P n P algorithm, the final object pose is solved. Experimental results on LMO and T-LESS datasets show that our proposed idea significantly outperforms existing sparse methods and also surpasses some state-of-the-art dense methods.
Yi GuoFei WangHao ChuShiguang Wen
Jianhan MeiXudong JiangHenghui Ding