ZHANG LeiZHANG SenhuiYAN SongYUAN Yuan
Addressing the issues of slow speed and poor performance in multi-object grasping detection in unstructured environments, a method of performing object detection before grasping detection is proposed. In object detection, to accelerate the network′s running speed, this paper improved the YOLOv5 network by employing depth wise separable convolutions and coordinate attention mechanisms. For the grasping task, a single-stage grasping pose detection algorithm was designed. Firstly, considering the interference present in unstructured environments, RGB-D images were selected as the input data for the grasping network, and GG-CNN was chosen as the backbone network. Secondly, to enhance the feature extraction capabilities of the grasping network, the parallel use of different-size convolutional kernels in the Inception-ResNet module was utilized to broaden the network's receptive field. Additionally, the integration of a parameter-free three-dimensional attention mechanism enabled the network to focus more on grasping information features and suppress background noise. Finally, a grasping quality evaluation was employed to refine the grasping boxes, and the grasping box with the highest confidence score was output. The experimental results indicate that the improved object detection network has a parameter count of 2 776 708 and achieves 102 frames per second (FPS). On the public Cornell dataset, the improved grasping detection network achieves an accuracy of 96.57% with a FPS of 54.17. The combination of the two improved networks can be deployed on robotic arms and effectively accomplish grasping tasks in multi-object scenarios, making them suitable for practical industrial applications.
Mohannad FaragAbdul GhafarMohammed Hayyan Alsibai
Yaling ChenYan-Rou CaiMing-Yang Cheng
Zhen XieTeerawat PiriyatharawetCanale Roberto
Bing-Yan WeiQian-Han ZhangTao MaXiao-Ying WuYu-Peng LiLi-Dong HaoMeihua Zhou
Hae-June ParkBohyeon AnSubin JooOhwon KwonMin Young KimJoonho Seo