Jianjun LeiDemin TuBo PengJie ZhuZhe ZhangChong WuQingming Huang
Recently, deep learning-based visual localization has gained significant attention and made remarkable advancements. Although previous visual localization methods have obtained promising performance on indoor or outdoor street scenes, there have been few attempts at visual localization on aerial scenes. In this article, a depth-aware aerial localization transformer (DALTR) is proposed to learn camera poses in real-world aerial scenes assisted by the depth map. To improve the ability of network to perceive on aerial scenes, a multi-level depth embedding transformer module is presented by adaptively incorporating depth information into multiple levels of transformer. In addition, to encourage the piece-wise smooth geometric characteristic of the scene coordinates, a depth-guided smoothness constraint is developed to provide additional supervision for scene coordinate regression. Extensive experimental results on aerial localization benchmark datasets demonstrate that the proposed DALTR achieves superior aerial localization performance.
Wujie ZhouYuqi CaiLiting ZhangWeiqing YanLu Yu
Changhong FuWeiyu PengSihang LiJunjie YeZiang Cao
Linqing ZhaoYi WeiJianqin LiJie ZhouJiwen Lu
Haonan ZhangLianli GaoPengpeng ZengAlan HanjalićHeng Tao Shen