Fine-grained classification represents localizing and aggregating distinct points between similar regions based on correlation among these known regions. Hence, finding accuracy in this localization approach of such discriminative regions in multi-domain scenario remains a major challenge. Here an improved framework through patch triplet has been proposed to designate this problem of localization in an image. A triplet of similar feature points with logical constraints has been applied to improve the accuracy of region localization, and automatically extract distinct features as geometrically-constrained triplets for classification. The resulting approach only presents object bounding and detecting boxes as in Region-Proposal Convolutional Neural Network (RCNN) approach. Its efficiency is proved using publicly available fine-grained datasets, where it outperforms or obtains comparable results. Several simple real-world objects in a home have been taken as input images. In our trial, the Scale Invariant Feature Transform (SIFT)-based key points have been taken and prepared utilizing channel and convolution operation. The model is retrained to recognize four different types of metal objects, with the entire process requiring four hours to explain and prepare each strong piece. A major benefit of using fine-grained based Faster-RCNN approach is better performance for object identification with SIFT key points.
Wei SunGuoce ZhangXiaorui ZhangXu ZhangNannan Ge
Rongqiang QianYong YueFrans CoenenBailing Zhang
Haitao ZhaoZhihui LaiHenry LeungXianyi Zhang