UNETR++ with Voxel-Focused Attention: Efficient 3D Medical Image Segmentation with Linear-Complexity Transformers

Sithembiso Ntanzi; Serestina Viriri

doi:10.3390/app152011034

ScienceGate Book Chapters

JOURNAL ARTICLE

UNETR++ with Voxel-Focused Attention: Efficient 3D Medical Image Segmentation with Linear-Complexity Transformers

Sithembiso Ntanzi Serestina Viriri

Year: 2025 Journal: Applied Sciences Vol: 15 (20)Pages: 11034-11034 Publisher: Multidisciplinary Digital Publishing Institute

DOI: 10.3390/app152011034

Get Full-Text PDF Get Analytical Report

Abstract

There have been significant breakthroughs in developing models for segmenting 3D medical images, with many promising results attributed to the incorporation of Vision Transformers (ViT). However, the fundamental mechanism of transformers, known as self-attention, has quadratic complexity, which significantly increases computational requirements, especially in the case of 3D medical images. In this paper, we investigate the UNETR++ model and propose a voxel-focused attention mechanism inspired by TransNeXt pixel-focused attention. The core component of UNETR++ is the Efficient Paired Attention (EPA) block, which learns from two interdependent branches: spatial and channel attention. For spatial attention, we incorporated the voxel-focused attention mechanism, which has linear complexity with respect to input sequence length, rather than projecting the keys and values into lower dimensions. The deficiency of UNETR++ lies in its reliance on dimensionality reduction for spatial attention, which reduces efficiency but risks information loss. Our contribution is to replace this with a voxel-focused attention design that achieves linear complexity without low-dimensional projection, thereby reducing parameters while preserving representational power. This effectively reduces the model’s parameter count while maintaining competitive performance and inference speed. On the Synapse dataset, the enhanced UNETR++ model contains 21.42 M parameters, a 50% reduction from the original 42.96 M, while achieving a competitive Dice score of 86.72%.

Keywords:

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.43

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Medical Image Segmentation Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

UNETR++ with Voxel-Focused Attention: Efficient 3D Medical Image Segmentation with Linear-Complexity Transformers

Abstract

Metrics

Topics

Related Documents

UNETR: Transformers for 3D Medical Image Segmentation

Slim UNETR: Scale Hybrid Transformers to Efficient 3D Medical Image Segmentation Under Limited Computational Resources

UNETR++: Delving Into Efficient and Accurate 3D Medical Image Segmentation

Lite Swin UNETR: A Lightweight Version of Swin UNETR for Efficient 3D Medical Image Segmentation

A Tiny Efficient U-Net with Gated Linear Attention for Medical Image Segmentation