Image Classification Based on Vision Transformer

Attiapo Acybah Morel Omer

doi:10.4236/jcc.2024.124005

ScienceGate Book Chapters

JOURNAL ARTICLE

Image Classification Based on Vision Transformer

Attiapo Acybah Morel Omer

Year: 2024 Journal: Journal of Computer and Communications Vol: 12 (04)Pages: 49-59 Publisher: Scientific Research Publishing

DOI: 10.4236/jcc.2024.124005

Get Full-Text PDF Get Analytical Report

Abstract

This research introduces an innovative approach to image classification, by making use of Vision Transformer (ViT) architecture. In fact, Vision Transformers (ViT) have emerged as a promising option for convolutional neural networks (CNN) for image analysis tasks, offering scalability and improved performance. Vision transformer ViT models are able to capture global dependencies and link among elements of images. This leads to the enhancement of feature representation. When the ViT model is trained on different models, it demonstrates strong classification capabilities across different image categories. The ViT's ability to process image patches directly, without relying on spatial hierarchies, streamlines the classification process and improves computational efficiency. In this research, we present a Python implementation using TensorFlow to employ the (ViT) model for image classification. Four categories of animals such as (cow, dog, horse and sheep) images will be used for classification. The (ViT) model is used to extract meaningful features from images, and a classification head is added to predict the class labels. The model is trained on the CIFAR-10 dataset and evaluated for accuracy and performance. The findings from this study will not only demonstrate the effectiveness of the Vision Transformer model in image classification tasks but also its potential as a powerful tool for solving complex visual recognition problems. This research fills existing gaps in knowledge by introducing a novel approach that challenges traditional convolutional neural networks (CNNs) in the field of computer vision. While CNNs have been the dominant architecture for image classification tasks, they have limitations in capturing long-range dependencies in image data and require hand-designed hierarchical feature extraction.

Keywords:

Artificial intelligence Computer vision Computer science Transformer Pattern recognition (psychology) Engineering Electrical engineering Voltage

Metrics

Cited By

7.49

FWCI (Field Weighted Citation Impact)

Refs

0.95

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Industrial Vision Systems and Defect Detection

Physical Sciences → Engineering → Industrial and Manufacturing Engineering

Image Processing Techniques and Applications

Physical Sciences → Engineering → Media Technology

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Classification Based on Vision Transformer

Abstract

Metrics

Citation History

Topics

Related Documents

Medical image classification based on enhanced Vision Transformer

Vision Transformer (ViT)-based Applications in Image Classification

Weather Image Classification Using Vision Transformer

Biomedical image classification using vision transformer

Biomedical image classification using vision transformer