This paper proposes to use keypoints as a self-supervision clue for learning\ndepth map estimation from a collection of input images. As ground truth depth\nfrom real images is difficult to obtain, there are many unsupervised and\nself-supervised approaches to depth estimation that have been proposed. Most of\nthese unsupervised approaches use depth map and ego-motion estimations to\nreproject the pixels from the current image into the adjacent image from the\nimage collection. Depth and ego-motion estimations are evaluated based on pixel\nintensity differences between the correspondent original and reprojected\npixels. Instead of reprojecting the individual pixels, we propose to first\nselect image keypoints in both images and then reproject and compare the\ncorrespondent keypoints of the two images. The keypoints should describe the\ndistinctive image features well. By learning a deep model with and without the\nkeypoint extraction technique, we show that using the keypoints improve the\ndepth estimation learning. We also propose some future directions for\nkeypoint-guided learning of structure-from-motion problems.\n
Xiao LuHaoran SunXiuling WangZhiguo ZhangHaixia Wang
Jingyuan MaXiangyu LeiNan LiuXian ZhaoShiliang Pu
Runze LiuGuanghui ZhangDongchen ZhuLei WangXiaolin ZhangJiamao Li
Yunxiao ShiHong CaiAmin AnsariFatih Porikli