JOURNAL ARTICLE

Performance Comparison of Vision Transformer-Based Models in Medical Image Classification

Abstract

In recent years, convolutional neural networks have shown significant success and are frequently used in medical image analysis applications. However, the convolution process in convolutional neural networks limits learning of long-term pixel dependencies in the local receptive field. Inspired by the success of transformer architectures in encoding long-term dependencies and learning more efficient feature representation in natural language processing, publicly available color fundus retina, skin lesion, chest X-ray, and breast histology images are classified using Vision Transformer (ViT), Data-Efficient Transformer (DeiT), Swin Transformer, and Pyramid Vision Transformer v2 (PVTv2) models and their classification performances are compared in this study. The results show that the highest accuracy values are obtained with the DeiT model at 96.5% in the chest X-ray dataset, the PVTv2 model at 91.6% in the breast histology dataset, the PVTv2 model at 91.3% in the retina fundus dataset, and the Swin model at 91.0% in the skin lesion dataset.

Keywords:
Artificial intelligence Computer science Convolutional neural network Pattern recognition (psychology) Transformer Computer vision Deep learning Pixel Engineering

Metrics

6
Cited By
1.53
FWCI (Field Weighted Citation Impact)
0
Refs
0.82
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

AI in cancer detection
Physical Sciences →  Computer Science →  Artificial Intelligence
Brain Tumor Detection and Classification
Life Sciences →  Neuroscience →  Neurology
Retinal Imaging and Analysis
Health Sciences →  Medicine →  Radiology, Nuclear Medicine and Imaging

Related Documents

JOURNAL ARTICLE

Medical image classification based on enhanced Vision Transformer

Yiwei ShengSihan Ren

Journal:   International Conference on Electronic Information Engineering, Big Data, and Computer Technology (EIBDCT 2022) Year: 2022 Pages: 29-29
JOURNAL ARTICLE

Image Classification Based on Vision Transformer

Attiapo Acybah Morel Omer

Journal:   Journal of Computer and Communications Year: 2024 Vol: 12 (04)Pages: 49-59
JOURNAL ARTICLE

PERFORMANCE COMPARISON OF VISION-LANGUAGE MODELS IN IMAGE CLASSIFICATION

Doğukan ÖzerenMehmet Erkan YükselAsım Sinan Yüksel

Journal:   International Journal of 3D Printing Technologies and Digital Industry Year: 2025 Vol: 9 (2)Pages: 247-262
JOURNAL ARTICLE

PERFORMANCE OF VISION TRANSFORMER ON GARBAGE IMAGE CLASSIFICATION

Nam Tran Quy

Journal:   Journal of Engineering Management and Information Technology Year: 2026 Vol: 4 (1)Pages: 25-34
© 2026 ScienceGate Book Chapters — All rights reserved.