Depth-Aware Transformer for Aerial Localization

Jianjun Lei; Demin Tu; Bo Peng; Jie Zhu; Zhe Zhang; Chong Wu; Qingming Huang

doi:10.1145/3773767

ScienceGate Book Chapters

JOURNAL ARTICLE

Depth-Aware Transformer for Aerial Localization

Jianjun Lei Demin Tu Bo Peng Jie Zhu Zhe Zhang Chong Wu Qingming Huang

Year: 2025 Journal: ACM Transactions on Multimedia Computing Communications and Applications Vol: 22 (1)Pages: 1-16 Publisher: Association for Computing Machinery

DOI: 10.1145/3773767

Get Full-Text PDF Get Analytical Report

Abstract

Recently, deep learning-based visual localization has gained significant attention and made remarkable advancements. Although previous visual localization methods have obtained promising performance on indoor or outdoor street scenes, there have been few attempts at visual localization on aerial scenes. In this article, a depth-aware aerial localization transformer (DALTR) is proposed to learn camera poses in real-world aerial scenes assisted by the depth map. To improve the ability of network to perceive on aerial scenes, a multi-level depth embedding transformer module is presented by adaptively incorporating depth information into multiple levels of transformer. In addition, to encourage the piece-wise smooth geometric characteristic of the scene coordinates, a depth-guided smoothness constraint is developed to provide additional supervision for scene coordinate regression. Extensive experimental results on aerial localization benchmark datasets demonstrate that the proposed DALTR achieves superior aerial localization performance.

Keywords:

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Depth-Aware Transformer for Aerial Localization

Abstract

Metrics

Topics

Related Documents

UTLNet: Uncertainty-Aware Transformer Localization Network for RGB-Depth Mirror Segmentation

Local Perception-Aware Transformer for Aerial Tracking

Context-aware Transformer Model for Crowd Localization

Structure-Aware Cross-Modal Transformer for Depth Completion

Depth-Aware Sparse Transformer for Video-Language Learning