Aiming at the problem of low recognition accuracy of underwater target detection due to blurred underwater optical imaging, overlapping underwater targets, and complex background environment, we propose a lightweight underwater detection algorithm with contextual multi-head self-attention mechanism and cross-scale fusion based on YOLOv5s. We firstly propose an improved multi-head self-attention module (CMHSA) with contextual information interaction to replace the convolutional module in the backbone. It increases the global dependence of deep semantic information, and enhances the extraction of target features by making full use of the rich contextual information in the self-attentive layer through cross-layer connection. Secondly, we introduce a hybrid convolution named GSConv to reduce the model parameters without affecting the accuracy. Lastly, a cross-scale connected path aggregation network (CSCPANet) is proposed, which fully integrates the stronger localization information carried by the shallow layer and the rich feature semantic information of the deep layer. It is conducive to improving the detection accuracy under the large variation of target scales. The experimental results on the URPC dataset show that the improved algorithm can effectively improve the detection accuracy while reducing the size of the model.
Wei ZhuLikai WANGZuobao JINDefeng HE
Huimin ShiQuan ZhouYinghao NiXiaofu WuLongin Jan Latecki
Zichen LiangGuang ChenZhijun LiPeigen LiuAlois Knoll
Jinkang WangXiaohui HeFaming ShaoGuanlin LuQunyan JiangRuizhe HuJinxin Li