Peng YaoYalu WangDongdong YangQiming LiuJingyi Yu
Abstract Accurate depth estimation is essential for unmanned underwater vehicles to effectively perceive their environment during target tracking tasks. Therefore, we propose a self-supervised monocular depth estimation framework tailored for underwater scenes, incorporating multi-attention mechanisms and the unique optical characteristics of underwater imagery. To address issues such as color distortion in underwater images, primarily caused by light attenuation in underwater scenes, we design an adaptive underwater light attenuation loss function to improve the model’s adaptability and generalization across diverse underwater scenes. The inherent blurriness of underwater images poses considerable challenges for feature extraction and semantic interpretation. We use dilated convolutions and linear space reduction attention (CDC Joint Linear SRA) to capture local and global features of underwater images, which are then integrated through feature map fusion. Subsequently, we use a multi-attention feature enhancement module to further enhance the spatial and semantic information of the extracted features. To address fusion interference arising from discrepancies in semantic information between feature maps, we introduce a progressive fusion module that balances cross-module features using a two-step feature refinement strategy. Comparative, ablation, and generalization experiments were conducted on the FLSea dataset to verify the superiority of the proposed model.
Deqiang ChengShuai XuChenggong HanChen LyuQiqi KouJianying Zhang
Jiansheng WeiShuguo PanWang GaoPeng Guo
Chao ZhangTian TianCheng HanTiancheng ShaoMi ZhouShichao Zhao
Shlomi AmitaiItzik KleinTali Treibitz