MonoLENS: Monocular Lightweight Efficient Network with Separable Convolutions for Self-Supervised Monocular Depth Estimation

Genki Higashiuchi; Tomoyasu Shimada; Xiangbo Kong; Haimin Yan; Hiroyuki Tomiyama

doi:10.3390/app151910393

ScienceGate Book Chapters

JOURNAL ARTICLE

MonoLENS: Monocular Lightweight Efficient Network with Separable Convolutions for Self-Supervised Monocular Depth Estimation

Genki Higashiuchi Tomoyasu Shimada Xiangbo Kong Haimin Yan Hiroyuki Tomiyama

Year: 2025 Journal: Applied Sciences Vol: 15 (19)Pages: 10393-10393 Publisher: Multidisciplinary Digital Publishing Institute

DOI: 10.3390/app151910393

Get Full-Text PDF Get Analytical Report

Abstract

Self-supervised monocular depth estimation is gaining significant attention because it can learn depth from video without needing expensive ground-truth data. However, many self-supervised models remain too heavy for edge devices, and simply shrinking them tends to degrade accuracy. To address this trade-off, we present MonoLENS, an extension of Lite-Mono. MonoLENS follows a design that reduces computation while preserving geometric fidelity (relative depth relations, boundaries, and planar structures). MonoLENS advances Lite-Mono by suppressing computation on paths with low geometric contribution, focusing compute and attention on layers rich in structural cues, and pruning redundant operations in later stages. Our model incorporates two new modules, the DS-Upsampling Block and the MCACoder, along with a simplified encoder. Specifically, the DS-Upsampling Block uses depthwise separable convolutions throughout the decoder, which greatly lowers floating-point operations (FLOPs). Furthermore, the MCACoder applies Multidimensional Collaborative Attention (MCA) to the output of the second encoder stage, helping to make edge details sharper in high-resolution feature maps. Additionally, we simplified the encoder’s architecture by reducing the number of blocks in its fourth stage from 10 to 4, which resulted in a further reduction of model parameters. When tested on both the KITTI and Cityscapes benchmarks, MonoLENS achieved leading performance. On the KITTI benchmark, MonoLENS reduced the number of model parameters by 42% (1.8M) compared with Lite-Mono, while simultaneously improving the squared relative error by approximately 4.5%.

Keywords:

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.39

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Advanced Vision and Imaging

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Optical measurement and interference techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Processing Techniques and Applications

Physical Sciences → Engineering → Media Technology

MonoLENS: Monocular Lightweight Efficient Network with Separable Convolutions for Self-Supervised Monocular Depth Estimation

Abstract

Metrics

Topics

Related Documents

Self-Supervised Monocular Depth Estimation with Binary Mask and Lightweight Network

Shufflemono: Rethinking Lightweight Network for Self-Supervised Monocular Depth Estimation

FasterDepth: a lightweight network for self-supervised monocular depth estimation

Dual-Path Attention Network for Lightweight Self-Supervised Monocular Depth Estimation

Spatial-Aware Dynamic Lightweight Self-Supervised Monocular Depth Estimation