End-to-End Remote Sensing Image Scene Classification with Vision Transformers

Hao Yuan; Kun Liu; Jiechuan Shi; Can Wang; Weiwei Wang

doi:10.1145/3627377.3627448

ScienceGate Book Chapters

JOURNAL ARTICLE

End-to-End Remote Sensing Image Scene Classification with Vision Transformers

Hao Yuan Kun Liu Jiechuan Shi Can Wang Weiwei Wang

Year: 2023 Pages: 365-371

DOI: 10.1145/3627377.3627448

Get Full-Text PDF Get Analytical Report

Abstract

In recent years, the development of deep learning technology has led to widespread attention on Vision Transformer (ViT) as an emerging image classification method. Remote sensing image classification is an important task in the field of remote sensing, with extensive application prospects. This paper aims to explore the remote sensing image classification method based on Vision Transformer, addressing the limitations of traditional convolutional neural networks in terms of global perception capability, context information retrieval, and positional encoding. The classification performance of the Vision Transformer model is evaluated and compared on remote sensing datasets. Vision Transformer is a deep neural network model based on self-attention mechanism that can capture the global context information in images and has achieved remarkable performance in various computer vision tasks. Furthermore, experimental results demonstrate that the remote sensing image classification method based on Vision Transformer exhibits outstanding accuracy and generalization ability. Compared to traditional convolutional neural networks, it can better capture the global features in remote sensing images and has better scalability when dealing with large-scale remote sensing image data. Experimental results on different remote sensing image datasets show that the model performs well compared to state-of-the-art methods. Specifically, Vision Transformer achieves average classification accuracies of 95.41%, 98.26%, 93.74% and 95.25% on the AID, UC-Merced, NWPU-RESISC45 and Optimal31 datasets, respectively.

Keywords:

Computer science Convolutional neural network Artificial intelligence Contextual image classification Transformer Deep learning Scalability Computer vision Remote sensing Image (mathematics) Engineering Database Geography

Metrics

Cited By

0.43

FWCI (Field Weighted Citation Impact)

Refs

0.65

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Remote-Sensing Image Classification

Physical Sciences → Engineering → Media Technology

Remote Sensing and Land Use

Physical Sciences → Earth and Planetary Sciences → Atmospheric Science

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

End-to-End Remote Sensing Image Scene Classification with Vision Transformers

Abstract

Metrics

Citation History

Topics

Related Documents

Vision Transformers for Remote Sensing (ViToRS) Image Scene Classification

Deep Vision Transformers for Remote Sensing Scene Classification

Vision Transformers for Remote Sensing Image Classification

Recent advances in the application of vision transformers to remote sensing image scene classification

An End-to-End Local-Global-Fusion Feature Extraction Network for Remote Sensing Image Scene Classification