JOURNAL ARTICLE

Structure-Aware Cross-Modal Transformer for Depth Completion

Linqing ZhaoYi WeiJianqin LiJie ZhouJiwen Lu

Year: 2024 Journal:   IEEE Transactions on Image Processing Vol: 33 Pages: 1016-1031   Publisher: Institute of Electrical and Electronics Engineers

Abstract

In this paper, we present a Structure-aware Cross-Modal Transformer (SCMT) to fully capture the 3D structures hidden in sparse depths for depth completion. Most existing methods learn to predict dense depths by taking depths as an additional channel of RGB images or learning 2D affinities to perform depth propagation. However, they fail to exploit 3D structures implied in the depth channel, thereby losing the informative 3D knowledge that provides important priors to distinguish the foreground and background features. Moreover, since these methods rely on the color textures of 2D images, it is challenging for them to handle poor-texture regions without the guidance of explicit 3D cues. To address this, we disentangle the hierarchical 3D scene-level structure from the RGB-D input and construct a pathway to make sharp depth boundaries and object shape outlines accessible to 2D features. Specifically, we extract 2D and 3D features from depth inputs and the back-projected point clouds respectively by building a two-stream network. To leverage 3D structures, we construct several cross-modal transformers to adaptively propagate multi-scale 3D structural features to the 2D stream, energizing 2D features with priors of object shapes and local geometries. Experimental results show that our SCMT achieves state-of-the-art performance on three popular outdoor (KITTI) and indoor (VOID and NYU) benchmarks.

Keywords:
Artificial intelligence Computer science Leverage (statistics) Computer vision RGB color model Exploit Point cloud Prior probability Transformer Modal Pattern recognition (psychology) Deep learning Engineering

Metrics

12
Cited By
6.36
FWCI (Field Weighted Citation Impact)
88
Refs
0.94
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Vision and Imaging
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Enhancement Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Optical measurement and interference techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Cross-Modal Transformer for Point Cloud Completion

Xing HeZhe ZhuXuefeng YanYanwen GuoLina GongMingqiang Wei

Journal:   Journal of Computer-Aided Design & Computer Graphics Year: 2024 Vol: 36 (7)Pages: 1026-1033
JOURNAL ARTICLE

Adaptive Context-Aware Multi-Modal Network for Depth Completion

Shanshan ZhaoMingming GongHuan FuDacheng Tao

Journal:   IEEE Transactions on Image Processing Year: 2021 Vol: 30 Pages: 5264-5276
JOURNAL ARTICLE

CASwin Transformer: A Hierarchical Cross Attention Transformer for Depth Completion

Chunyu FengXiaonian WangYangyang ZhangChengfeng ZhaoMengxuan Song

Journal:   2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC) Year: 2022 Pages: 2836-2841
JOURNAL ARTICLE

Structure-Aware Transformer for hyper-relational knowledge graph completion

Junjie WangHuajun ChenWen Zhang

Journal:   Expert Systems with Applications Year: 2025 Vol: 277 Pages: 126992-126992
© 2026 ScienceGate Book Chapters — All rights reserved.