Monocular image depth estimation has some problems, such as fuzzy depth estimation, inaccurate distance information and incomplete details in complex scenes. Aiming at these problems, a monocular depth estimation method based on pyramid vision transformer network optimization is proposed. An encoder with a pyramid transformer as the skeleton network is used to segment the image and obtain the position information between each pixel block, while a lightweight decoder is used and feature fusion is improved. Experiments on the dataset demonstrate that the proposed network can enhance the edge details and improve the accuracy of depth estimation.
M. Angelin PonraniP. EzhilarasiS. Rajeshkannan
Qiuchen WangHui ShuaiQingshan Liu