Sparse Transformer-based bins and Polarized Cross Attention decoder for monocular depth estimation

Hai‐Kun Wang; Jiahui Du; Kechen Song; Limin Cui

doi:10.1016/j.jestch.2024.101705

ScienceGate Book Chapters

JOURNAL ARTICLE

Sparse Transformer-based bins and Polarized Cross Attention decoder for monocular depth estimation

Hai‐Kun Wang Jiahui Du Kechen Song Limin Cui

Year: 2024 Journal: Engineering Science and Technology an International Journal Vol: 54 Pages: 101705-101705 Publisher: Elsevier BV

DOI: 10.1016/j.jestch.2024.101705

Get Full-Text PDF Get Analytical Report

Abstract

Calculating depth using just one image is a crucial issue since it has applications in numerous computer vision domains. Although some recent works directly obtain the depth map through some complex and powerful networks, we want to combine the encoder and decoder feature maps more effectively. To this end, we propose a novel U-Net like network. The encoder is based on Swin Transformer. For the decoder, we propose Polarization Cross Attention to effectively combine codec features by optimizing the initialization of the k and v vector. In order to conduct a more in-depth global analysis of the decoded output, a Sparse Transformer post-processing module is proposed. In the Sparse Transformer module, we adopt Kullback–Leibler divergence to obtain a sparse Q matrix and achieve O((hw)ln(hw)) in time complexity and memory usage. Results from experiments utilizing the KITTI and NYUV2 datasets demonstrate how well the suggested strategy enhances the precision of monocular depth perception when compared with state-of-the-art methods.

Keywords:

Monocular Transformer Computer science Artificial intelligence Engineering Electrical engineering Voltage

Metrics

Cited By

1.06

FWCI (Field Weighted Citation Impact)

Refs

0.66

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Vision and Imaging

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Processing Techniques and Applications

Physical Sciences → Engineering → Media Technology

Image Enhancement Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Sparse Transformer-based bins and Polarized Cross Attention decoder for monocular depth estimation

Abstract

Metrics

Citation History

Topics

Related Documents

Transformer-Based Monocular Depth Estimation Using Token Attention

Transformer-based Monocular Depth Estimation with Attention Supervision

Light-weight Monocular Depth Estimation Via Cross Attention Fusion of Sparse LiDAR

Depth Monocular Estimation with Attention-based Encoder-Decoder Network from Single Image

CaBins: CLIP-based Adaptive Bins for Monocular Depth Estimation