Deep Vision Transformers for Remote Sensing Scene Classification

Laila Bashmal; Yakoub Bazi; Mohamad Mahmoud Al Rahhal

doi:10.1109/igarss47720.2021.9553684

ScienceGate Book Chapters

JOURNAL ARTICLE

Deep Vision Transformers for Remote Sensing Scene Classification

Laila Bashmal Yakoub Bazi Mohamad Mahmoud Al Rahhal

Year: 2021 Pages: 2815-2818

DOI: 10.1109/igarss47720.2021.9553684

Get Full-Text PDF Get Analytical Report

Abstract

In this paper, we present a scene classification method based on vision transformers. These types of networks, which are now the standard models in natural language processing (NLP) do not rely on convolution block as in convolutional neural networks (CNNs). Alternatively, they are based on a mechanism known as multi-head self-attention (MSA), which captures the contextual relations between image pixels regardless of their spatial distance. At the first step, the images under analysis are split into patches, then converted to sequence by flattening and embedding. The embedding position is encoded and added to the sequence to preserve the order of the patches. Then, the resulting sequence is fed to several MSA layers for generating the final representation. To increase the classification performance, we employed several data augmentation strategies to expand the size and the diversity of the training data. Additionally, we show experimentally that we can compress the network by pruning half of its layers while keeping the competing performance. We further investigate the performance of the data-efficient image transformers (DeiT), a version of the model that is trained by knowledge distillation with less amount of data. Experimental results on two remote sensing datasets show that vision transformers can outperform state-of-the-art methods based on CNNs.

Keywords:

Computer science Artificial intelligence Transformer Embedding Convolutional neural network Pattern recognition (psychology) Computer vision Pixel

Metrics

Cited By

2.34

FWCI (Field Weighted Citation Impact)

Refs

0.89

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Remote-Sensing Image Classification

Physical Sciences → Engineering → Media Technology

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Remote Sensing and Land Use

Physical Sciences → Earth and Planetary Sciences → Atmospheric Science

Deep Vision Transformers for Remote Sensing Scene Classification

Abstract

Metrics

Citation History

Topics

Related Documents

Vision Transformers for Remote Sensing (ViToRS) Image Scene Classification

End-to-End Remote Sensing Image Scene Classification with Vision Transformers

TRS: Transformers for Remote Sensing Scene Classification

Vision Transformers for Remote Sensing Image Classification

Recent advances in the application of vision transformers to remote sensing image scene classification