CNN–Transformer Hybrid Architecture for Underwater Sonar Image Segmentation

Juan Lei; Huigang Wang; Zhiyu Lei; Jiayuan Li; Shaowei Rong

doi:10.3390/rs17040707

ScienceGate Book Chapters

JOURNAL ARTICLE

CNN–Transformer Hybrid Architecture for Underwater Sonar Image Segmentation

Juan Lei Huigang Wang Zhiyu Lei Jiayuan Li Shaowei Rong

Year: 2025 Journal: Remote Sensing Vol: 17 (4)Pages: 707-707 Publisher: Multidisciplinary Digital Publishing Institute

DOI: 10.3390/rs17040707

Get Full-Text PDF Get Analytical Report

Abstract

The salient object detection (SOD) of forward-looking sonar images plays a crucial role in underwater detection and rescue tasks. However, the existing SOD algorithms find it difficult to effectively extract salient features and spatial structure information from images with scarce semantic information, uneven intensity distribution, and high noise. Convolutional neural networks (CNNs) have strong local feature extraction capabilities, but they are easily constrained by the receptive field and lack the ability to model long-range dependencies. Transformers, with their powerful self-attention mechanism, are capable of modeling the global features of a target, but they tend to lose a significant amount of local detail. Mamba effectively models long-range dependencies in long sequence inputs through a selection mechanism, offering a novel approach to capturing long-range correlations between pixels. However, since the saliency of image pixels does not exhibit sequential dependencies, this somewhat limits Mamba’s ability to fully capture global contextual information during the forward pass. Inspired by multimodal feature fusion learning, we propose a hybrid CNN–Transformer–Mamba architecture, termed FLSSNet. FLSSNet is built upon a CNN and Transformer backbone network, integrating four core submodules to address various technical challenges: (1) The asymmetric dual encoder–decoder (ADED) is capable of simultaneously extracting features from different modalities and systematically modeling both local contextual information and global spatial structure. (2) The Transformer feature converter (TFC) module optimizes the multimodal feature fusion process through feature transformation and channel compression. (3) The long-range correlation attention (LRCA) module enhances CNN’s ability to model long-range dependencies through the collaborative use of convolutional kernels, selective sequential scanning, and attention mechanisms, while effectively suppressing noise interference. (4) The recursive contour refinement (RCR) model refines edge contour information through a layer-by-layer recursive mechanism, achieving greater precision in boundary details. The experimental results show that FLSSNet exhibits outstanding competitiveness among 25 state-of-the-art SOD methods, achieving MAE and Eξ values of 0.04 and 0.973, respectively.

Keywords:

Underwater Computer science Sonar Transformer Artificial intelligence Architecture Computer vision Marine engineering Geology Oceanography Geography Electrical engineering Engineering

Metrics

Cited By

38.19

FWCI (Field Weighted Citation Impact)

Refs

0.99

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Image Enhancement Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Underwater Acoustics Research

Physical Sciences → Earth and Planetary Sciences → Oceanography

CNN–Transformer Hybrid Architecture for Underwater Sonar Image Segmentation

Abstract

Metrics

Citation History

Topics

Related Documents

Hybrid transformer-CNN architecture for enhanced underwater image semantic segmentation

HyFormer: a hybrid transformer-CNN architecture for retinal OCT image segmentation

Transformer-cnn hybrid network for underwater image enhancement

SonarNet: Hybrid CNN-Transformer-HOG Framework and Multifeature Fusion Mechanism for Forward-Looking Sonar Image Segmentation

A Hybrid Transformer-CNN Architecture Integrating SegFormer and U-Net for Enhanced Image Segmentation