Enhancing Monocular Depth Estimation Using Attention in Hybrid Network

Zhiyong Huo; Liping Zhao

doi:10.1109/wcsp58612.2023.10404542

ScienceGate Book Chapters

JOURNAL ARTICLE

Enhancing Monocular Depth Estimation Using Attention in Hybrid Network

Zhiyong Huo Liping Zhao

Year: 2023 Pages: 68-73

DOI: 10.1109/wcsp58612.2023.10404542

Get Full-Text PDF Get Analytical Report

Abstract

Within the realm of deep learning-based Monocular Depth Estimation (MDE), Vision Transformers (ViTs) have garnered substantial attention as a network framework owing to their unique structure and impressive attention mechanism. However, ViTs encounter limitations in effectively capturing spatial features and exhibit a lack of sensitivity towards local information when compared to Convolutional Neural Networks (CNNs). This study aims to address the underutilization of valuable local information by ViTs and the oversight of performance improvement facilitated by the decoder. To tackle these challenges, we propose a hybrid network that leverages the strengths of both ViTs and CNNs to enhance both local information and long-range dependencies. In the encoder stage, we introduce an innovative patch attention mechanism to capture varying levels of attention across diverse regions. Furthermore, in the decoder stage, a cross-attention mechanism is devised to enhance feature fusion. Through extensive experimentation on diverse datasets, including KITTI, DIW, DIODE, and Sintel, our approach achieves more effective and representative features, leading to significant performance improvements of up to 13.98% compared to the state-of-the-art benchmarks in the MDE task.

Keywords:

Computer science Encoder Artificial intelligence Convolutional neural network Fusion mechanism Monocular Deep learning Task (project management) Artificial neural network Feature (linguistics) Machine learning Fusion Engineering

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.35

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Industrial Vision Systems and Defect Detection

Physical Sciences → Engineering → Industrial and Manufacturing Engineering

Advanced Vision and Imaging

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Optical measurement and interference techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Enhancing Monocular Depth Estimation Using Attention in Hybrid Network

Abstract

Metrics

Topics

Related Documents

Bidirectional Attention Network for Monocular Depth Estimation

Dual-Stream Multiscale Attention Monocular Depth Estimation Network

Patch-Wise Attention Network for Monocular Depth Estimation

Depth estimation from single monocular images using deep hybrid network

Efficient unsupervised monocular depth estimation using attention guided generative adversarial network