Deep Models for Rigid Objects Real-Time Pose Estimation

Zhao, Jianyu

doi:10.17030/uclan.thesis.00053726

ScienceGate Book Chapters

JOURNAL ARTICLE

Deep Models for Rigid Objects Real-Time Pose Estimation

Zhao, Jianyu

Year: 2024 Journal: CLOK (University of Central Lancashire) Publisher: University of Central Lancashire

DOI: 10.17030/uclan.thesis.00053726

Get Full-Text PDF Get Analytical Report

Abstract

Accurate and robust six degrees of freedom (6-DoF) pose estimation of rigid objects is one of the fundamental tasks in computer vision, with wide-ranging applications that span industrial automation, augmented reality, and medical intervention. However, most existing methods typically rely on knowledge of objects’ 3D models and depth measurements, and often require time-consuming iterative refinement to improve accuracy, which can be seen as limiting factors for broader applications. This PhD thesis is primarily motivated by the desire to overcome these limitations. It presents a comprehensive study of the 6-DoF pose estimation problem. Drawing inspiration from the latest deep learning pose estimation methods, a novel 6-DoF pose estimation framework named Auto-Pose is proposed, which incorporates latent space representations of deep neural networks with supervised learning algorithms. The proposed framework consists of three novel autoencoder-based methods: DALSR-Pose, CVML-Pose, and CVAM-Pose. These proposed methods are specifically designed to address the limitations of the existing methods enabling the estimation of rigid objects’ 6-DoF poses from a single colour image in real time, without access to any explicit 3D models of the objects or depth data or performing a post-refinement. The fundamental idea is to implicitly learn intermediate representations of objects in the latent space from only colour images, and the 6-DoF poses are estimated from the latent representations using multiple regression-based algorithms such as multilayer perception (MLP), k-nearest neighbours (KNN), and random forest (RF). Deep Models for Rigid Objects Real-Time Pose Estimation. The proposed methods can operate in real time and are applicable in complex scenarios, including textured/texture-less objects represented in low-resolution images with heavy occlusion and clutter. Extensive experiments and evaluation results across multiple publicly available benchmark datasets demonstrate the superiority of the proposed framework in pose estimation accuracy over existing methods that similarly use latent space representations, with accuracy improved by 30%. It also achieves comparable results to other state-of-the-art methods that use 3D models. The thesis makes significant contributions to the field of 6-DoF pose estimation facilitating development of model-free estimation algorithms. The novelty of the work rests in the proposed autoencoder-based methods that achieve competitive performance compared to the state-of-the-art using only data from a monoscopic camera, without the need for the object’s 3D model, depth measurement, or further iterative refinement often essential for the existing methods.

Keywords:

Pose 3D pose estimation Articulated body pose estimation Deep learning Space (punctuation) Perception Artificial neural network Estimation

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.45

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Robot Manipulation and Learning

Physical Sciences → Engineering → Control and Systems Engineering

Robotics and Sensor-Based Localization

Physical Sciences → Engineering → Aerospace Engineering

Robotic Mechanisms and Dynamics

Physical Sciences → Engineering → Control and Systems Engineering

Deep Models for Rigid Objects Real-Time Pose Estimation

Abstract

Metrics

Topics

Related Documents

Robust Monocular Pose Estimation of Rigid 3D Objects in Real-Time

Real-time pose estimation of rigid objects using RGB-D imagery

Real-time pose estimation of rigid objects in heavily cluttered environments

Model-Based Pose Estimation for Rigid Objects

Real-time scalable 6DOF pose estimation for textureless objects