Maximilian SchlosserStephan PrettnerTatyana Ivanovska
Transformers gained tremendous popularity in computer vision community, and have been successfully applied to such classical tasks as image classification, object detection, and semantic segmentation. One of the most popular architectures is the Shifted Window Vision Transformer (SWIN), which has been extended for the semantic segmentation task. In this work, we consider one of the encoder-decoder architectures, namely, the Swin UNETR, and thoroughly analyze it with respect to performance and efficiency. We also propose several architectural changes and discuss their influence on the results. Our findings are cross-validated using several medical datasets.
Junyoung ParkMinyoung ParkTaikyeong JeongSungwook Yu
Ali HatamizadehYucheng TangVishwesh NathDong YangAndriy MyronenkoBennett A. LandmanHolger R. RothDaguang Xu
Jiawei JinSen YangJigang TongKai ZhangZenghui Wang
Ali HatamizadehVishwesh NathYucheng TangDong YangHolger R. RothDaguang Xu