Crack segmentation in images plays a pivotal role in the monitoring of structural surfaces, serving as a fundamental technique for assessing structural integrity. However, existing methods that rely solely on RGB images exhibit high sensitivity to light conditions, which significantly restricts their adaptability in complex environmental scenarios. To address this, we propose a structure-aware progressive multi-modal fusion network (SPMFNet) for RGB-thermal (RGB-T) crack segmentation. The main idea is to integrate complementary information from RGB and thermal images and incorporate structural priors (edge information) to achieve accurate segmentation. Here, to better fuse multi-layer features from different modalities, a progressive multi-modal fusion strategy is designed. In the shallow encoder layers, two gate control attention (GCA) modules are introduced to dynamically regulate the fusion process through a gating mechanism, allowing the network to adaptively integrate modality-specific structural details based on the input. In the deeper layers, two attention feature fusion (AFF) modules are employed to enhance semantic consistency by leveraging both local and global attention, thereby facilitating the effective interaction and complementarity of high-level multi-modal features. In addition, edge prior information is introduced to encourage the predicted crack regions to preserve structural integrity, which is constrained by a joint loss of edge-guided loss, multi-scale focal loss, and adaptive fusion loss. Experimental results on publicly available RGB-T crack detection datasets demonstrate that the proposed method outperforms both classical and advanced approaches, verifying the effectiveness of the progressive fusion strategy and the utilization of the structural prior.
Yingjie WuShaoqi LiYancheng Li
Minyu WangZhongjie ZhuYuer WangRenwei TuJiuxing WengXiaolin Yu
Jing JinJian‐Qin LiuFengwen ZHAI
Yangxintong LyuIonut SchiopuAdrian Munteanu