Latent Edge Guided Depth Super-Resolution Using Attention-Based Hierarchical Multi-Modal Fusion

Hui Lan; Cheolkon Jung

doi:10.1109/access.2024.3435504

ScienceGate Book Chapters

JOURNAL ARTICLE

Latent Edge Guided Depth Super-Resolution Using Attention-Based Hierarchical Multi-Modal Fusion

Hui Lan Cheolkon Jung

Year: 2024 Journal: IEEE Access Vol: 12 Pages: 114512-114526 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/access.2024.3435504

Get Full-Text PDF Get Analytical Report

Abstract

Color guided depth super-resolution (SR) aims to reconstruct a high-resolution (HR) depth image from a low-resolution (LR) one guided by its paired HR color image. However, when the sampling factor is large, color guided depth SR suffers from reconstructing accurate depth edges due to the severe loss of high frequency (HF) components. In this paper, we propose a latent edge guided depth SR network using attention-based hierarchical multi-modal fusion, named LEDSRNet. We extract the hierarchical multi-modal features from HR color and LR depth images, and perform selective fusion to estimate the residual map for depth SR. Firstly, we perform gradient map estimation to generate accurate depth edges from the input HR color image and the interpolated LR depth image, and filter out unnecessary edges in the HR color image while preventing texture copying artifacts in depth SR. Then, we perform depth upsampling to get depth edges from the input LR depth image and refine them guided by gradient features in the latent space. Moreover, we fuse the features extracted from gradient map estimation and depth upsampling to obtain the residual map for depth SR. Finally, we reconstruct SR depth image by adding the residual map to the interpolated LR depth image. We design an attention based multi-level residual block (AMRB) as the basic block for LEDSRNet to extract both shallow and deep features in color and depth images for hierarchical multi-modal fusion. In the loss function, we use a binarized gradient map from the ground truth depth image, i.e. mask map, to calculate the loss for edge and smooth areas separately, preventing excessive smoothing of edge regions in the reconstructed SR depth image. Extensive experiments show that LEDSRNet reconstructs accurate depth edges even in the large sampling factor and achieves the best performance in RMSE with low running time and small model parameters. They indicate that LEDSRNet outperforms state-of-the-art methods in terms of both visual quality and quantitative measurements.

Keywords:

Upsampling Artificial intelligence Depth map Computer vision Residual Image gradient Computer science Color image Image fusion Ground truth Image resolution Pattern recognition (psychology) Image (mathematics) Image processing Algorithm

Metrics

Cited By

0.53

FWCI (Field Weighted Citation Impact)

Refs

0.54

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Image Processing Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Vision and Imaging

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Processing Techniques and Applications

Physical Sciences → Engineering → Media Technology

Latent Edge Guided Depth Super-Resolution Using Attention-Based Hierarchical Multi-Modal Fusion

Abstract

Metrics

Citation History

Topics

Related Documents

High-Resolution Depth Maps Imaging via Attention-Based Hierarchical Multi-Modal Fusion

Fast Hierarchical Depth Super-Resolution via Guided Attention

Degradation-Guided Multi-Modal Fusion Network for Depth Map Super-Resolution

IGAF: Incremental Guided Attention Fusion for Depth Super-Resolution

Hierarchical Edge Refinement Network for Guided Depth Map Super-Resolution