Yantao SongYunli LuLu ChenYimin Luo
Segmentation is an important prerequisite for developing model healthcare systems, particularly for disease diagnosis and treatment planning. In the field of medical image segmentation, the U-shaped architecture, commonly referred to as U-Net, has emerged as the de facto standard and achieved remarkable success. However, due to the intrinsic locality of convolution operations, U-Net generally demonstrates limitations in explicitly modeling long-range dependency. Recent transformer-based models, designed for sequence-to-sequence prediction, have emerged as an alternative to traditional architectures, featuring innate global self-attention mechanisms. Unfortunately, they may sometimes suffer from limited localization abilities due to a lack of sufficient low-level details. To merit both Transformers and U-Net, in this paper, we propose a novel two-channel self-attention mechanism U-network, which performs feature extraction from two channels, CNN and Transformer, respectively. Compared to previous models, we propose two hierarchical feature fusion strategies from both spatial and channel dimensions. Moreover, to further promote the model performance, a loss function that can dynamically adjust the weights according to the output of each layer is constructed. Experimental results on five different datasets show that our method performs consistently outperforms state-of-the-art methods, and it also has an outstanding generalization ability to various medical image modalities.
Luyao WangXiaoyan WangBangze ZhangXiaojie HuangCong BaiMing XiaPeiliang Sun
Tianxiang LiuXuemei SunYing YuZhen Yu
Shun YanBenquan YangAihua ChenXiaoming ZhaoShiqing Zhang
Zhiwei LiangKui ZhaoGang LiangSiyu LiYifei WuYiping Zhou