Object pose estimation involves calculating the six degrees of freedom pose of a rigid object. A typical approach involves utilizing convolution neural networks to detect keypoints and subsequently estimating pose from these points. Investigations focused on combining RGB images and point clouds to extract keypoints remain a subject of interest. In this work, we propose an offsets estimation network based on fused features to estimate pose. We first propose a more effective encoding method for fuse image features into point cloud features. Then we design an offsets estimate network to calculate the keypoint's location. Next, we estimate multiple poses according to the keypoints predicted above in two levels. Finally, we aggregate multiple poses into the final pose according to the weight of each pose. Our experiments show that our method outperforms other approaches in the LineMOD benchmark dataset.
Wei-Bai DuanQishen LiSihao YuanXiao Yu
Rui WangJiangwei TongXiangyang Wang
Zongwang HanLong ChenShiqing Wu