Feng PanBo NiXiantao CaiYutao Xie
U-Net[1] framework which containing an encoder- decoder architecture is still a comment choice for semantic segmentation in medical area. However, due to the intrinsic locality of convolution operations, the U-Net framework is not capable of capturing long-range dependency. Transformer[2] which can model long-range dependency because of the insider self-attention mechanism, first proposed in natural language processing domain and got a great success, is introduced to computer vision and has achieved promising results in the downstream tasks such as image classification and segmentation. In this paper, we propose UTransNet to explore a way to fuse transformer into U-Net to take both advantage of the characteristics of convolution layer and transformer to segment medical images. We test our end-to-end network on ATLAS datasets and the results demonstrate that the performance of our method is superior than previous U-Net based methods but with the least parameters.
Lea Katrina S. CornelioMary Abigail V. del CastilloProspero C. Naval
Mahsa KarimzadehHadi SeyedarabiAta JodeiriReza Afrouzian
Haisheng HuiXueying ZhangZelin WuLI Fen-lian
Lifang ChenJiawei LiYunmin ZouTao Wang