Hao YuanKun LiuJiechuan ShiCan WangWeiwei Wang
In recent years, the development of deep learning technology has led to widespread attention on Vision Transformer (ViT) as an emerging image classification method. Remote sensing image classification is an important task in the field of remote sensing, with extensive application prospects. This paper aims to explore the remote sensing image classification method based on Vision Transformer, addressing the limitations of traditional convolutional neural networks in terms of global perception capability, context information retrieval, and positional encoding. The classification performance of the Vision Transformer model is evaluated and compared on remote sensing datasets. Vision Transformer is a deep neural network model based on self-attention mechanism that can capture the global context information in images and has achieved remarkable performance in various computer vision tasks. Furthermore, experimental results demonstrate that the remote sensing image classification method based on Vision Transformer exhibits outstanding accuracy and generalization ability. Compared to traditional convolutional neural networks, it can better capture the global features in remote sensing images and has better scalability when dealing with large-scale remote sensing image data. Experimental results on different remote sensing image datasets show that the model performs well compared to state-of-the-art methods. Specifically, Vision Transformer achieves average classification accuracies of 95.41%, 98.26%, 93.74% and 95.25% on the AID, UC-Merced, NWPU-RESISC45 and Optimal31 datasets, respectively.
Pankaj Kumar GharaiMogalla Shashi
Laila BashmalYakoub BaziMohamad Mahmoud Al Rahhal
Yakoub BaziLaila BashmalMohamad Mahmoud Al RahhalReham Al-DayilNaif Al Ajlan
Yafei LvXiaohan ZhangWei XiongYaqi CuiMi Cai