JOURNAL ARTICLE

End-to-End Remote Sensing Image Scene Classification with Vision Transformers

Abstract

In recent years, the development of deep learning technology has led to widespread attention on Vision Transformer (ViT) as an emerging image classification method. Remote sensing image classification is an important task in the field of remote sensing, with extensive application prospects. This paper aims to explore the remote sensing image classification method based on Vision Transformer, addressing the limitations of traditional convolutional neural networks in terms of global perception capability, context information retrieval, and positional encoding. The classification performance of the Vision Transformer model is evaluated and compared on remote sensing datasets. Vision Transformer is a deep neural network model based on self-attention mechanism that can capture the global context information in images and has achieved remarkable performance in various computer vision tasks. Furthermore, experimental results demonstrate that the remote sensing image classification method based on Vision Transformer exhibits outstanding accuracy and generalization ability. Compared to traditional convolutional neural networks, it can better capture the global features in remote sensing images and has better scalability when dealing with large-scale remote sensing image data. Experimental results on different remote sensing image datasets show that the model performs well compared to state-of-the-art methods. Specifically, Vision Transformer achieves average classification accuracies of 95.41%, 98.26%, 93.74% and 95.25% on the AID, UC-Merced, NWPU-RESISC45 and Optimal31 datasets, respectively.

Keywords:
Computer science Convolutional neural network Artificial intelligence Contextual image classification Transformer Deep learning Scalability Computer vision Remote sensing Image (mathematics) Engineering Database Geography

Metrics

2
Cited By
0.43
FWCI (Field Weighted Citation Impact)
18
Refs
0.65
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Remote-Sensing Image Classification
Physical Sciences →  Engineering →  Media Technology
Remote Sensing and Land Use
Physical Sciences →  Earth and Planetary Sciences →  Atmospheric Science
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.