JOURNAL ARTICLE

Utnetpara: A Hybrid CNN-Transformer Architecture with Multi-Scale Fusion for Whole-Slide Image Segmentation

Abstract

In medical image segmentation tasks, Convolutional Neural Networks (CNNs) have become an efficient and successful solution, although they have limitations in explicitly modeling long-term dependencies. The Transformer neural network has recently demonstrated its capabilities in image segmentation, although a large amount of data is required for training. In this study, we present a hybrid architecture, UTNetPara, that integrates the Transformer into a U-shaped CNN to improve segmentation accuracy on a medium-sized dataset. Self-attention modules are applied in both the encoder and decoder to enhance the ability to capture long-term dependencies at different scales. Efficient self-attention mechanisms with relative position encoding are employed to reduce the computational cost accordingly. A fully annotated dataset consisting of whole slide images scanned from periodic acid-Schiff stained mouse kidney tissue is used for evaluation. The proposed method is trained to segment the main renal structures: glomerular tuft, glomerulus including Bowman's capsule, tubules, arteries, arterial lumina, and veins. Our experimental results indicate that the UTNetPara has a better segmentation performance than other state-of-the-art models.

Keywords:
Computer science Architecture Artificial intelligence Computer vision Transformer Image segmentation Segmentation Fusion Pattern recognition (psychology) Engineering Electrical engineering Geography

Metrics

2
Cited By
1.06
FWCI (Field Weighted Citation Impact)
26
Refs
0.68
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Medical Image Segmentation Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

ZoomISEG: Interactive Multi-Scale Fusion for Histopathology Whole Slide Image Segmentation

Seonghui MinWon‐Ki Jeong

Journal:   Journal of the Korea Computer Graphics Society Year: 2023 Vol: 29 (3)Pages: 127-135
BOOK-CHAPTER

Multi-scale Prototypical Transformer for Whole Slide Image Classification

Saisai DingJun WangJuncheng LiJun Shi

Lecture notes in computer science Year: 2023 Pages: 602-611
© 2026 ScienceGate Book Chapters — All rights reserved.