JOURNAL ARTICLE

Latent Edge Guided Depth Super-Resolution Using Attention-Based Hierarchical Multi-Modal Fusion

Hui LanCheolkon Jung

Year: 2024 Journal:   IEEE Access Vol: 12 Pages: 114512-114526   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Color guided depth super-resolution (SR) aims to reconstruct a high-resolution (HR) depth image from a low-resolution (LR) one guided by its paired HR color image. However, when the sampling factor is large, color guided depth SR suffers from reconstructing accurate depth edges due to the severe loss of high frequency (HF) components. In this paper, we propose a latent edge guided depth SR network using attention-based hierarchical multi-modal fusion, named LEDSRNet. We extract the hierarchical multi-modal features from HR color and LR depth images, and perform selective fusion to estimate the residual map for depth SR. Firstly, we perform gradient map estimation to generate accurate depth edges from the input HR color image and the interpolated LR depth image, and filter out unnecessary edges in the HR color image while preventing texture copying artifacts in depth SR. Then, we perform depth upsampling to get depth edges from the input LR depth image and refine them guided by gradient features in the latent space. Moreover, we fuse the features extracted from gradient map estimation and depth upsampling to obtain the residual map for depth SR. Finally, we reconstruct SR depth image by adding the residual map to the interpolated LR depth image. We design an attention based multi-level residual block (AMRB) as the basic block for LEDSRNet to extract both shallow and deep features in color and depth images for hierarchical multi-modal fusion. In the loss function, we use a binarized gradient map from the ground truth depth image, i.e. mask map, to calculate the loss for edge and smooth areas separately, preventing excessive smoothing of edge regions in the reconstructed SR depth image. Extensive experiments show that LEDSRNet reconstructs accurate depth edges even in the large sampling factor and achieves the best performance in RMSE with low running time and small model parameters. They indicate that LEDSRNet outperforms state-of-the-art methods in terms of both visual quality and quantitative measurements.

Keywords:
Upsampling Artificial intelligence Depth map Computer vision Residual Image gradient Computer science Color image Image fusion Ground truth Image resolution Pattern recognition (psychology) Image (mathematics) Image processing Algorithm

Metrics

1
Cited By
0.53
FWCI (Field Weighted Citation Impact)
53
Refs
0.54
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Image Processing Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Vision and Imaging
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Processing Techniques and Applications
Physical Sciences →  Engineering →  Media Technology

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.