A Monocular Depth Estimation Method for Indoor-Outdoor Scenes Based on Vision Transformer

Jianghai Shuai; Ming Li; Yongkang Feng; Yang Li; Sidan Du

doi:10.1109/uemcon59035.2023.10316039

ScienceGate Book Chapters

JOURNAL ARTICLE

A Monocular Depth Estimation Method for Indoor-Outdoor Scenes Based on Vision Transformer

Jianghai Shuai Ming Li Yongkang Feng Yang Li Sidan Du

Year: 2023 Vol: 11 Pages: 741-747

DOI: 10.1109/uemcon59035.2023.10316039

Get Full-Text PDF Get Analytical Report

Abstract

In the field of computer vision, monocular depth estimation has garnered significant attention as a research direction. However, current depth estimation methods often overlook the impact of depth range variations in indoor and outdoor scenes, consequently limiting the model's generalization ability. To achieve high-precision depth estimation across different depth ranges, we propose a new method. We employ the pretrained model Dinov2 as encoder, combined with decoder based on CNN architecture, to enhance the network's capacity for extracting global information from indoor and outdoor scenes. Also, we design a mapping module to transform diverse depth ranges into a unified 0-1 range, which can effectively adapt to indoor and outdoor scenes. We validate our method on the DIODE dataset, which comprises mixed indoor and outdoor scenes. Experimental results demonstrate that our method achieves higher depth estimation accuracy and stronger generalization performance when dealing with scenes of diverse depth ranges.

Keywords:

Computer science Monocular Artificial intelligence Computer vision Encoder Generalization Range (aeronautics) Transformer Depth map Image (mathematics) Mathematics Engineering

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.15

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Advanced Vision and Imaging

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Processing Techniques and Applications

Physical Sciences → Engineering → Media Technology

Advanced Image Processing Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

A Monocular Depth Estimation Method for Indoor-Outdoor Scenes Based on Vision Transformer

Abstract

Metrics

Topics

Related Documents

A real-time unsupervised monocular depth estimation method for outdoor scenes

MBUDepthNet: Real-Time Unsupervised Monocular Depth Estimation Method for Outdoor Scenes

Vision Transformer-Based Monocular Depth Estimation for Fisheye Cameras

MobileDepth: Monocular Depth Estimation Based on Lightweight Vision Transformer

ISS Monocular Depth Estimation Via Vision Transformer