Umar AsifMohammed BennamounFerdous Sohel
© 2015 IEEE. This paper presents an efficient approach to recognize objects captured with an RGB-D sensor. The proposed approach uses a Bag-of-Words (BOW) model to learn feature representations from raw RGB-D point clouds in a weakly supervised manner. To this end, we introduce a novel method based on randomized clustering trees to learn visual vocabularies which are fast to compute and more discriminative compared to the vocabularies generated by classical methods such as k-means. We show that, when combined with standard spatial pooling strategies, our proposed approach yields a powerful feature representation for RGB-D object recognition. Our extensive experimental evaluation on two challenging RGB-D object datasets and live video streams from Kinect shows that our learned features result in superior object recognition accuracies compared with the state-of-the-art methods.
Liefeng BoXiaofeng RenDieter Fox
Sensen TuYingjian XueX ZhangXun HuangHaiping Lin
Zeyu ChenMingyu ZhuShuhan ChenLu LuHaonan TangXuelong HuChunfan Ji
Yanhua ChengXin ZhaoKaiqi HuangTieniu Tan
Yen‐Yu LinJyun-Fan TsaiTyng-Luh Liu