Structure-Aware Cross-Modal Transformer for Depth Completion

Linqing Zhao; Yi Wei; Jianqin Li; Jie Zhou; Jiwen Lu

doi:10.1109/tip.2024.3355807

ScienceGate Book Chapters

JOURNAL ARTICLE

Structure-Aware Cross-Modal Transformer for Depth Completion

Linqing Zhao Yi Wei Jianqin Li Jie Zhou Jiwen Lu

Year: 2024 Journal: IEEE Transactions on Image Processing Vol: 33 Pages: 1016-1031 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/tip.2024.3355807

Get Full-Text PDF Get Analytical Report

Abstract

In this paper, we present a Structure-aware Cross-Modal Transformer (SCMT) to fully capture the 3D structures hidden in sparse depths for depth completion. Most existing methods learn to predict dense depths by taking depths as an additional channel of RGB images or learning 2D affinities to perform depth propagation. However, they fail to exploit 3D structures implied in the depth channel, thereby losing the informative 3D knowledge that provides important priors to distinguish the foreground and background features. Moreover, since these methods rely on the color textures of 2D images, it is challenging for them to handle poor-texture regions without the guidance of explicit 3D cues. To address this, we disentangle the hierarchical 3D scene-level structure from the RGB-D input and construct a pathway to make sharp depth boundaries and object shape outlines accessible to 2D features. Specifically, we extract 2D and 3D features from depth inputs and the back-projected point clouds respectively by building a two-stream network. To leverage 3D structures, we construct several cross-modal transformers to adaptively propagate multi-scale 3D structural features to the 2D stream, energizing 2D features with priors of object shapes and local geometries. Experimental results show that our SCMT achieves state-of-the-art performance on three popular outdoor (KITTI) and indoor (VOID and NYU) benchmarks.

Keywords:

Artificial intelligence Computer science Leverage (statistics) Computer vision RGB color model Exploit Point cloud Prior probability Transformer Modal Pattern recognition (psychology) Deep learning Engineering

Metrics

Cited By

6.36

FWCI (Field Weighted Citation Impact)

Refs

0.94

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Vision and Imaging

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Enhancement Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Optical measurement and interference techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Structure-Aware Cross-Modal Transformer for Depth Completion

Abstract

Metrics

Citation History

Topics

Related Documents

Cross-Modal Transformer for Point Cloud Completion

Adaptive Context-Aware Multi-Modal Network for Depth Completion

CASwin Transformer: A Hierarchical Cross Attention Transformer for Depth Completion

IPE Transformer for Depth Completion with Input-Aware Positional Embeddings

Structure-Aware Transformer for hyper-relational knowledge graph completion