Biomedical image classification using vision transformer

Hapsari Peni Agustin Tjahyaningtijas; Muhammad Aamir Nashrullah; Pradini Puspitaningayu; Lusia Rakhmawati; Yuni Yamasari; Jessie R. Paragas

JOURNAL ARTICLE

Biomedical image classification using vision transformer

Hapsari Peni Agustin Tjahyaningtijas Muhammad Aamir Nashrullah Pradini Puspitaningayu Lusia Rakhmawati Yuni Yamasari Jessie R. Paragas

Year: 2025 Journal: Springer Link (Chiba Institute of Technology) Publisher: Chiba Institute of Technology

Get Full-Text PDF Get Analytical Report

Abstract

Particularly in the areas of public health and welfare, biomedical image classification plays a significant role in advancing the Sustainable Development Goals (SDGs). Classifying medical images is a challenging and quickly developing area of computer vision and artificial intelligence. Significant progress has been made in the classification of medical images, including MRI, CT, X-ray, and histopathological tissue images, thanks to the use of deep learning techniques. CNN and its variations are well-known deep learning architectures used for medical image classification. CNN’s limitations in medical image classification are due to its focus on local features through convolution operations, which prevents it from understanding global relationships between images. Furthermore, CNN occasionally requires many layers to capture a wide range of geographic context, which leads to the loss of important data and increased model complexity. By using a self-attention mechanism that can model global relationships among visual components from an early stage, the Vision Transformer (ViT) overcomes these shortcomings. ViT efficiently captures long-term dependencies and intricate spatial structures by partitioning the image into patches and processing them concurrently within transformer blocks, surpassing CNN performance. This research conducted on the application of Vision Transformer (ViT) architectures to biomedical image classification. Ten key studies were analyzed, encompassing tasks such as breast and brain tumor classification, COVID-19 detection, and lung nodule identification. ViT‐based models consistently achieved high performance: peak accuracies ranged from 95.1% to 99.6%, with complementary metrics (sensitivity, specificity, AUC) exceeding 90% in most cases. Despite their promise, ViT face challenges related to extensive data requirements and computational complexity. Emerging solutions, including hybrid architectures, self-supervised pretraining, and hierarchical embeddings, aim to mitigate these limitations. Future directions involve developing lightweight, privacy-preserving ViT variants and enhancing model explainability to support trustworthy clinical adoption.

Keywords:

Deep learning Medical imaging Transformer Image processing Pattern recognition (psychology) Contextual image classification Machine vision Facial recognition system

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.41

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Geochemistry and Geologic Mapping

Physical Sciences → Computer Science → Artificial Intelligence

Geological Modeling and Analysis

Physical Sciences → Earth and Planetary Sciences → Geochemistry and Petrology

Electrical and Electromagnetic Research

Physical Sciences → Physics and Astronomy → Atomic and Molecular Physics, and Optics

Biomedical image classification using vision transformer

Abstract

Metrics

Topics

Related Documents

Biomedical image classification using vision transformer

Weather Image Classification Using Vision Transformer

Vehicle Image Classification Method Using Vision Transformer

Privacy-Preserving Image Classification Using Vision Transformer

Image Quality Distortion Classification Using Vision Transformer