Multimodal Magnetic Resonance Imaging (MRI) is crucial for the accurate diagnosis and segmentation of brain tumors, as different modalities provide complementary information. However, effectively fusing these modalities is challenging. Existing methods often rely on simple concatenation or self-attention mechanisms that treat all modalities equally, potentially overlooking the nuanced inter-modal dependencies. In this paper, we propose CACE (Cross-Attention Correlation Extraction), an improved variant of the Correlation Extraction (CE) module from SFusion framework. We focus on enhancing the SFusions CE component while maintaining its proven Modal Attention (MA) layer unchanged. We investigate four CACE variants based on cross-attention architecture: CACE v1 (parallel cross-attention), CACE v2 (sequential cross-attention), CACE v3 (DAFTED sequential approach), and CACE v4 (residual fusion). Through experiments on the BRaTS 2020 dataset, we found that CACE v4 achieves the best performance with an average Dice score of 80.53%. We applied CACE only at the bottleneck layer of U-Net, our approach achieves effective multimodal fusion with computational efficiency. Experiments on the BraTS 2020 dataset validate the effectiveness of our proposed fusion strategy. Our work provides a new framework for multi-modal MRI-image segmentation tasks.
Kangkang SunJiangyi DingQixuan LiWei ChenHeng ZhangJiawei SunZhuqing JiaoXinye Ni
Tongxue ZhouStéphane CanuSu Ruan
Rahul SinghElaine Yuen-Phin Lee