JOURNAL ARTICLE

Image Classification Based on Vision Transformer

Attiapo Acybah Morel Omer

Year: 2024 Journal:   Journal of Computer and Communications Vol: 12 (04)Pages: 49-59   Publisher: Scientific Research Publishing

Abstract

This research introduces an innovative approach to image classification, by making use of Vision Transformer (ViT) architecture. In fact, Vision Transformers (ViT) have emerged as a promising option for convolutional neural networks (CNN) for image analysis tasks, offering scalability and improved performance. Vision transformer ViT models are able to capture global dependencies and link among elements of images. This leads to the enhancement of feature representation. When the ViT model is trained on different models, it demonstrates strong classification capabilities across different image categories. The ViT's ability to process image patches directly, without relying on spatial hierarchies, streamlines the classification process and improves computational efficiency. In this research, we present a Python implementation using TensorFlow to employ the (ViT) model for image classification. Four categories of animals such as (cow, dog, horse and sheep) images will be used for classification. The (ViT) model is used to extract meaningful features from images, and a classification head is added to predict the class labels. The model is trained on the CIFAR-10 dataset and evaluated for accuracy and performance. The findings from this study will not only demonstrate the effectiveness of the Vision Transformer model in image classification tasks but also its potential as a powerful tool for solving complex visual recognition problems. This research fills existing gaps in knowledge by introducing a novel approach that challenges traditional convolutional neural networks (CNNs) in the field of computer vision. While CNNs have been the dominant architecture for image classification tasks, they have limitations in capturing long-range dependencies in image data and require hand-designed hierarchical feature extraction.

Keywords:
Artificial intelligence Computer vision Computer science Transformer Pattern recognition (psychology) Engineering Electrical engineering Voltage

Metrics

11
Cited By
7.49
FWCI (Field Weighted Citation Impact)
3
Refs
0.95
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Industrial Vision Systems and Defect Detection
Physical Sciences →  Engineering →  Industrial and Manufacturing Engineering
Image Processing Techniques and Applications
Physical Sciences →  Engineering →  Media Technology
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.