Abstract

This paper addresses the task of category-level pose estimation for articulated objects from a single depth image. We present a novel category-level approach that correctly accommodates object instances previously unseen during training. We introduce Articulation-aware Normalized Coordinate Space Hierarchy (ANCSH) - a canonical representation for different articulated objects in a given category. As the key to achieve intra-category generalization, the representation constructs a canonical object space as well as a set of canonical part spaces. The canonical object space normalizes the object orientation, scales and articulations (e.g. joint parameters and states) while each canonical part space further normalizes its part pose and scale. We develop a deep network based on PointNet++ that predicts ANCSH from a single depth point cloud, including part segmentation, normalized coordinates, and joint parameters in the canonical object space. By leveraging the canonicalized joints, we demonstrate: 1) improved performance in part pose and scale estimations using the induced kinematic constraints from joints; 2) high accuracy for joint parameter estimation in camera space.

Keywords:
Pose Artificial intelligence Object (grammar) Computer vision Representation (politics) Computer science Canonical correlation Generalization Point cloud Articulated body pose estimation Orientation (vector space) Segmentation 3D pose estimation Pattern recognition (psychology) Mathematics Geometry

Metrics

181
Cited By
12.39
FWCI (Field Weighted Citation Impact)
39
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Robotics and Sensor-Based Localization
Physical Sciences →  Engineering →  Aerospace Engineering
Robot Manipulation and Learning
Physical Sciences →  Engineering →  Control and Systems Engineering
© 2026 ScienceGate Book Chapters — All rights reserved.