With the advance of computer technology and smart device, many applications, such as face recognition and object recognition, have been developed to facilitate human-computer interaction (HCI) efficiently. In this respect, the hand-held object recognition plays an important role in HCI. It can be used not only to help computer understand useros intentions but also to meet useros requirements. In recent years the appearance of convolutional neural networks (CNNs) greatly enhances the performance of object recognition and this technology has been applied to hand-held object recognition in some works. However, these supervised learning models need large number of labelled data and many iterations to train their large number of parameters. This is a huge challenge for HCI, because HCI need to deal with in-time and itos difficult to collect enough labeled data. Especially when a new category need to be learnt, it will spend a lot of time to update the model. In this work, we adopt the one-shot learning method to solve this problem. This method does not need to update the model when a new category need to be learnt. Moreover, depth image is robust to light and color variation. We fuse depth image information to harness the complementary relationship between the two modalities to improve the performance of hand-held object recognition. Experimental results on our handheld object dataset demonstrate that our method for hand-held object recognition achieves an improvement of performance.
Leixian QiaoXue LiShuqiang Jiang
Lv XiongXinda LiuXiangyang LiXue LiShuqiang JiangZhiqiang He
Lv XiongShuqiang JiangLuis HerranzShuang Wang
Shuang LiuShuang WangLifang WuShuqiang Jiang
Quanquan ShaoJin QiJin MaYi FangWeiming WangJie Hu