6-DoF Grasp Detection Method Based on Vision Language Guidance

Xixing Li; Jiahao Chen; Rui Wu; Tao Liu

doi:10.3390/pr13051598

ScienceGate Book Chapters

JOURNAL ARTICLE

6-DoF Grasp Detection Method Based on Vision Language Guidance

Xixing Li Jiahao Chen Rui Wu Tao Liu

Year: 2025 Journal: Processes Vol: 13 (5)Pages: 1598-1598 Publisher: Multidisciplinary Digital Publishing Institute

DOI: 10.3390/pr13051598

Get Full-Text PDF Get Analytical Report

Abstract

The interactive grasp of robots can grasp the corresponding objects according to the user’s choice. Most interactive grasp methods based on deep learning comprise visual language and grasp detection models. However, in existing methods, the trainability and generalization ability of the visual language model is weak, and the robot cannot cope well with grasping small target objects. Therefore, this paper proposes a 6-DoF grasp detection method guided by visual language, which converts text instructions and RGBD images of the scene to be grasped into inputs and outputs for the 6-DoF grasp posture of the object corresponding to the text instructions. In order to improve the trainability and feature extraction ability of the visual language model, a multi-head attention mechanism combined with hybrid normalization is designed. At the same time, a local attention mechanism is introduced into the grasp detection model to enhance the global and local information interaction ability of point cloud data, thereby improving the grasping ability of the grasp detection model for small target objects. The method proposed in this paper first uses the improved visual language model to predict the plane position information of the target object, then uses the improved grasp detection model to predict all the graspable postures in the scene, and finally uses the plane position information to filter out the graspable postures of the target object. The visual language model and grasp detection model proposed in this paper have achieved excellent performance in various scenarios of public datasets while ensuring a specific generalization ability. In addition, we also conducted real grasp experiments, and the 6-DoF grasp detection method based on visual language guidance proposed in this paper achieved a grasp success rate of 95%.

Keywords:

GRASP Computer vision Computer science Artificial intelligence Human–computer interaction Programming language

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.17

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Robot Manipulation and Learning

Physical Sciences → Engineering → Control and Systems Engineering

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Hand Gesture Recognition Systems

Physical Sciences → Computer Science → Human-Computer Interaction

6-DoF Grasp Detection Method Based on Vision Language Guidance

Abstract

Metrics

Topics

Related Documents

Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance

6-DOF Grasp Detection for Unknown Objects

MTGrasp: Multiscale 6-DoF Robotic Grasp Detection

LieGrasPFormer: Point Transformer-Based 6-DOF Grasp Detection with Lie Algebra Grasp Representation

An Economic Framework for 6-DoF Grasp Detection