JOURNAL ARTICLE

Learning Self-Supervised Vision Transformers from Scratch for Aerial Person Re- Identification

Abstract

In recent years, person re-identification (Re-ID) as a widely studied computer vision task, has reached a saturation state under closed-world setting, which encourages researchers to further explore more realistic scenarios. Among them, person Re- Idin aerial imagery is proposed and improved due to its unique practical importance in public security. However, since the aerial person images are taken by unmanned aerial vehicles (UAV s), influenced by camera height and angle of view, there are more serious problems such as weak appearance feature and occlusion than ground person images. Most of the current state-of-the-art person Re-ID methods on closed-world datasets are based on local convolution neural network, and hardly works well when applying them to aerial person Re- Idtasks directly. In this paper, we improve the emerging vision transformer (ViT) and apply it to the person Re- Idin aerial imagery. It should be noted that a large amount of data is required to be pretrained for ViTs to achieve competitive performance. Considering the limitations of data, computing power and flexibility in practical scenarios, we improve the pre-training process based on self-supervised learning, and achieve training ViTs from scratch with limited data. Specifically, in pre-training stage, the self-supervised paradigm based on parameter instance discrimination is applied to capture feature alignment and instance similarity, which alleviates the data-hungry of ViTs caused by the lack of inductive bias. Extensive comparative evaluation experiments are conducted on the aerial Re- Iddataset. Our method achieves a Rank-1 accuracy of 65.29% and a mean average precision (mAP) of 57.31%, which proves its effectiveness in aerial person Re-ID tasks.

Keywords:
Computer science Artificial intelligence Computer vision Transformer Identification (biology) Scratch Pattern recognition (psychology) Machine learning Engineering Electrical engineering

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
19
Refs
0.23
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Fire Detection and Safety Systems
Physical Sciences →  Engineering →  Safety, Risk, Reliability and Quality
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Parameter instance learning with enhanced vision transformers for aerial person re‐identification

Houfu PengXing LüLili XuDaoxun XiaXiaoyao Xie

Journal:   Concurrency and Computation Practice and Experience Year: 2024 Vol: 36 (12)
JOURNAL ARTICLE

Tracklet Self-Supervised Learning for Unsupervised Person Re-Identification

Guile WuXiatian ZhuShaogang Gong

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2020 Vol: 34 (07)Pages: 12362-12369
JOURNAL ARTICLE

Ubiquitous vision of transformers for person re-identification

Nazia PerwaizMuhammad ShahzadMuhammad Moazam Fraz

Journal:   Machine Vision and Applications Year: 2023 Vol: 34 (2)
JOURNAL ARTICLE

Personvit: large-scale self-supervised vision transformer for person re-identification

Bin HuXinggang WangWenyu Liu

Journal:   Machine Vision and Applications Year: 2025 Vol: 36 (2)
JOURNAL ARTICLE

Noise Perception Self-Supervised Learning for Unsupervised Person Re-Identification

Jingya WangJianfeng WenWeiping DingChunlin YuXiatian ZhuZhiyong Wang

Journal:   IEEE Transactions on Emerging Topics in Computational Intelligence Year: 2025 Pages: 1-11
© 2026 ScienceGate Book Chapters — All rights reserved.