JOURNAL ARTICLE

Self-supervised Monocular Depth Estimation with Multi-Scale Feature Fusion

Abstract

Self-supervised monocular depth estimation shows great potential without using ground truth depth as supervision. Depth information is the key information for scene understanding, however, real scenes are often complex, and the scales of different targets vary greatly. To alleviate the problems caused by scale changes and small targets, we propose a depth estimation method based on multi-scale feature fusion, which integrate the encoding features and decoding features at the same level more adequately. Specifically, we design a multi-scale feature fusion (MSFF) module, which contains two branches, performing global context aggregation and local context aggregation on features respectively. By further fusing the information of these two branches, the network can simultaneously pay attention to large targets with more global distribution and small targets with more local distribution. We conducted a series of experiments on the KITTI dataset, demonstrating that our method can achieve competitive results. The visualization results show that our method can obtain high-quality depth maps although the scales of targets in the scene vary greatly.

Keywords:
Computer science Monocular Artificial intelligence Context (archaeology) Scale (ratio) Feature (linguistics) Visualization Encoding (memory) Ground truth Computer vision Decoding methods Pattern recognition (psychology) Feature extraction Key (lock) Geography Algorithm

Metrics

3
Cited By
0.55
FWCI (Field Weighted Citation Impact)
16
Refs
0.61
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Vision and Imaging
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Robotics and Sensor-Based Localization
Physical Sciences →  Engineering →  Aerospace Engineering
Image Processing Techniques and Applications
Physical Sciences →  Engineering →  Media Technology
© 2026 ScienceGate Book Chapters — All rights reserved.