Enhanced self-supervised monocular depth estimation with self-attention and joint depth-pose loss for laparoscopic images

Wenda Li; Yuichiro Hayashi; Masahiro Oda; Takayuki Kitasaka; Kazunari Misawa; Kensaku Mori

doi:10.1007/s11548-025-03332-1

ScienceGate Book Chapters

JOURNAL ARTICLE

Enhanced self-supervised monocular depth estimation with self-attention and joint depth-pose loss for laparoscopic images

Wenda Li Yuichiro Hayashi Masahiro Oda Takayuki Kitasaka Kazunari Misawa Kensaku Mori

Year: 2025 Journal: International Journal of Computer Assisted Radiology and Surgery Vol: 20 (4)Pages: 775-785 Publisher: Springer Science+Business Media

DOI: 10.1007/s11548-025-03332-1

Get Full-Text PDF Get Analytical Report

Abstract

Abstract Purpose Depth estimation is a powerful tool for navigation in laparoscopic surgery. Previous methods utilize predicted depth maps and the relative poses of the camera to accomplish self-supervised depth estimation. However, the smooth surfaces of organs with textureless regions and the laparoscope’s complex rotations make depth and pose estimation difficult in laparoscopic scenes. Therefore, we propose a novel and effective self-supervised monocular depth estimation method with self-attention-guided pose estimation and a joint depth-pose loss function for laparoscopic images. Methods We extract feature maps and calculate the minimum re-projection error as a feature-metric loss to establish constraints based on feature maps with more meaningful representations. Moreover, we introduce the self-attention block in the pose estimation network to predict rotations and translations of the relative poses. In addition, we minimize the difference between predicted relative poses as the pose loss. We combine all of the losses as a joint depth-pose loss. Results The proposed method is extensively evaluated using SCARED and Hamlyn datasets. Quantitative results show that the proposed method achieves improvements of about 18.07 $$\%$$ % and 14.00 $$\%$$ % in the absolute relative error when combining all of the proposed components for depth estimation on SCARED and Hamlyn datasets. The qualitative results show that the proposed method produces smooth depth maps with low error in various laparoscopic scenes. The proposed method also exhibits a trade-off between computational efficiency and performance. Conclusion This study considers the characteristics of laparoscopic datasets and presents a simple yet effective self-supervised monocular depth estimation. We propose a joint depth-pose loss function based on the extracted feature for depth estimation on laparoscopic images guided by a self-attention block. The experimental results prove that all of the proposed components contribute to the proposed method. Furthermore, the proposed method strikes an efficient balance between computational efficiency and performance.

Keywords:

Artificial intelligence Pose Computer science Monocular Feature (linguistics) Mean squared error Metric (unit) Computer vision Estimation Mathematics Pattern recognition (psychology) Statistics

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.04

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Advanced Vision and Imaging

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Robotics and Sensor-Based Localization

Physical Sciences → Engineering → Aerospace Engineering

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Enhanced self-supervised monocular depth estimation with self-attention and joint depth-pose loss for laparoscopic images

Abstract

Metrics

Topics

Related Documents

Self-supervised monocular depth estimation via joint attention and intelligent mask loss

Self-supervised monocular depth estimation with coordinate attention

Self-Supervised Monocular Depth Estimation with Scene Dynamic Pose

Joint Soft–Hard Attention for Self-Supervised Monocular Depth Estimation

Spatiotemporally Enhanced Photometric Loss for Self-Supervised Monocular Depth Estimation