Zishun LiuZhenxi LiJuyong ZhangLigang Liu
Local feature descriptors represent image patches as floating-point or binary arrays for computer vision tasks. In this paper, we propose to train Euclidean and Hamming embedding for image patch description with triplet convolutional networks. Thanks to the learning ability of deep ConvNets, the trained local feature generation method, which is called Deeply Learned Feature Transform (DELFT), has good distinctiveness and robustness. Evaluated on the UBC benchmark, we get the state-of-the-art results using floating-point and binary features. Also, the learned features can cooperate with existing nearest neighbor search algorithms in Euclidean and Hamming space. In addition, a new benchmark is constructed to facilitate future related research, which contains 40 million image patches, corresponding to 6.7 million 3D points, being 25 times larger than existing dataset. The distinctiveness and robustness of the proposed method are demonstrated in the experimental results.
Iaroslav MelekhovJuho KannalaEsa Rahtu
Dongye ZhuangDongming ZhangJintao LiKe LvQi Tian
Ayush GourPraveen BhanodiaKamal Kumar SethiShivshankar Rajput
Lijun ZhaoHuihui BaiAnhong WangYao Zhao