JOURNAL ARTICLE

MonoLENS: Monocular Lightweight Efficient Network with Separable Convolutions for Self-Supervised Monocular Depth Estimation

Genki HigashiuchiTomoyasu ShimadaXiangbo KongHaimin YanHiroyuki Tomiyama

Year: 2025 Journal:   Applied Sciences Vol: 15 (19)Pages: 10393-10393   Publisher: Multidisciplinary Digital Publishing Institute

Abstract

Self-supervised monocular depth estimation is gaining significant attention because it can learn depth from video without needing expensive ground-truth data. However, many self-supervised models remain too heavy for edge devices, and simply shrinking them tends to degrade accuracy. To address this trade-off, we present MonoLENS, an extension of Lite-Mono. MonoLENS follows a design that reduces computation while preserving geometric fidelity (relative depth relations, boundaries, and planar structures). MonoLENS advances Lite-Mono by suppressing computation on paths with low geometric contribution, focusing compute and attention on layers rich in structural cues, and pruning redundant operations in later stages. Our model incorporates two new modules, the DS-Upsampling Block and the MCACoder, along with a simplified encoder. Specifically, the DS-Upsampling Block uses depthwise separable convolutions throughout the decoder, which greatly lowers floating-point operations (FLOPs). Furthermore, the MCACoder applies Multidimensional Collaborative Attention (MCA) to the output of the second encoder stage, helping to make edge details sharper in high-resolution feature maps. Additionally, we simplified the encoder’s architecture by reducing the number of blocks in its fourth stage from 10 to 4, which resulted in a further reduction of model parameters. When tested on both the KITTI and Cityscapes benchmarks, MonoLENS achieved leading performance. On the KITTI benchmark, MonoLENS reduced the number of model parameters by 42% (1.8M) compared with Lite-Mono, while simultaneously improving the squared relative error by approximately 4.5%.

Keywords:

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
50
Refs
0.39
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Advanced Vision and Imaging
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Optical measurement and interference techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Processing Techniques and Applications
Physical Sciences →  Engineering →  Media Technology
© 2026 ScienceGate Book Chapters — All rights reserved.