JOURNAL ARTICLE

Deep Digging into the Generalization of Self-Supervised Monocular Depth Estimation

Jin Woo BaeSungho MoonSunghoon Im

Year: 2023 Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Vol: 37 (1)Pages: 187-196   Publisher: Association for the Advancement of Artificial Intelligence

Abstract

Self-supervised monocular depth estimation has been widely studied recently. Most of the work has focused on improving performance on benchmark datasets, such as KITTI, but has offered a few experiments on generalization performance. In this paper, we investigate the backbone networks (e.g., CNNs, Transformers, and CNN-Transformer hybrid models) toward the generalization of monocular depth estimation. We first evaluate state-of-the-art models on diverse public datasets, which have never been seen during the network training. Next, we investigate the effects of texture-biased and shape-biased representations using the various texture-shifted datasets that we generated. We observe that Transformers exhibit a strong shape bias and CNNs do a strong texture-bias. We also find that shape-biased models show better generalization performance for monocular depth estimation compared to texture-biased models. Based on these observations, we newly design a CNN-Transformer hybrid network with a multi-level adaptive feature fusion module, called MonoFormer. The design intuition behind MonoFormer is to increase shape bias by employing Transformers while compensating for the weak locality bias of Transformers by adaptively fusing multi-level representations. Extensive experiments show that the proposed method achieves state-of-the-art performance with various public datasets. Our method also shows the best generalization ability among the competitive methods.

Keywords:
Monocular Computer science Artificial intelligence Transformer Generalization Pattern recognition (psychology) Machine learning Computer vision Mathematics Engineering

Metrics

100
Cited By
27.64
FWCI (Field Weighted Citation Impact)
72
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Image Processing Techniques and Applications
Physical Sciences →  Engineering →  Media Technology
Advanced Vision and Imaging
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Optical measurement and interference techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Self-Supervised Monocular Depth Estimation by Digging into Uncertainty Quantification

Yuanzhen LiShengjie ZhengZi-Xin TanTuo CaoFei LuoChunxia Xiao

Journal:   Journal of Computer Science and Technology Year: 2023 Vol: 38 (3)Pages: 510-525
JOURNAL ARTICLE

Self-Supervised Deep Monocular Depth Estimation With Ambiguity Boosting

Juan Luis Gonzalez BelloMunchurl Kim

Journal:   IEEE Transactions on Pattern Analysis and Machine Intelligence Year: 2021 Vol: 44 (12)Pages: 9131-9149
BOOK-CHAPTER

Revisiting Self-supervised Monocular Depth Estimation

Ue-Hwan KimGyeong-Min LeeJong-Hwan Kim

Lecture notes in networks and systems Year: 2022 Pages: 336-350
BOOK-CHAPTER

Self-Distilled Self-Supervised Monocular Depth Estimation

Julio César Díaz MendozaHélio Pedrini

Series on language processing, pattern recognition, and intelligent systems Year: 2024 Pages: 165-185
© 2026 ScienceGate Book Chapters — All rights reserved.