JOURNAL ARTICLE

Efficient monocular coarse-to-fine object pose estimation

Abstract

The vision and robotics communities have developed different methods for object pose estimation, all of which have their disadvantages and advantages. A popular method saves all possible object model images from different viewpoints and their 2D-to-3D correspondences in database off-line. Then local feature matching is applied between the current view and the model images in the database. For the top matched image, the approach of a PnP algorithm followed by RANSAC is used to estimate object pose. Such a method has good accuracy, but lacks efficiency, consuming O(MN 2 ) time where N and M are the number of features in a model and the number of models, respectively. To tackle this problem, we propose a method that improves the efficiency in two ways. First, we employ a hierarchical clustering method to find the proper number of model images to represent each object, leading to a decrease in M. Second, a coase-to-fine object pose estimation method is proposed, to decrease the time to find the best matching model image. Specifically, in the coarse step, given an image, the most similar model image is retrieved using a global image descriptor, which we compute using a pre-trained deep neural network. Then in the fine step, a local descriptor feature matching method is applied to find matching keypoints between current image and the model image found in the coarse step. Finally, with pre-registered 2D-to-3D correspondences for each model, an accurate object pose is calculated using the PnP and RANSAC approach. The performance of our method is evaluated on the Amazon Picking Challenge dataset.

Keywords:
RANSAC Artificial intelligence Pose Matching (statistics) Computer science Object (grammar) Computer vision Feature (linguistics) Image (mathematics) Pattern recognition (psychology) 3D pose estimation Artificial neural network Cluster analysis Mathematics

Metrics

2
Cited By
0.53
FWCI (Field Weighted Citation Impact)
35
Refs
0.84
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Robotics and Sensor-Based Localization
Physical Sciences →  Engineering →  Aerospace Engineering
Robot Manipulation and Learning
Physical Sciences →  Engineering →  Control and Systems Engineering
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.