As a challenging and high practical research topic in public safety, person re-identification (Re-ID) technology has attracted increasing attention in the field of computer vision. Due to the success of deep learning, Convolutional Neural Networks (CNNs) have become the main techniques to extract discriminative features for person Re-ID++. However, there are many problems in the real scene of pedestrian images, such as pedestrian posture changes, inconsistent shooting perspective and object occlusion, etc. The global features extracted from images by CNNs are easily disturbed by these problems, resulting in the lack of robustness and discrimination, which further leads to low recognition accuracy. To solve these issues, we propose a multi-scale attention network based on multi-feature fusion (MSAN), which adopts a multi-branch deep network structure consisting of a global feature learning branch, two local feature learning branches and a shallow-level feature learning branch. It can sample the features of different depth of the network and get discriminative feature embedding by combining global and local cues, and then the sampled features are fused to predict the pedestrians. We also use the attention mechanism to make the network to focus on the key information of different scale feature maps thus enhance the learning of key parts of the human body and alleviate the interference caused by image changes. Experimental results on three mainstream benchmark datasets Market-1501, DukeMTMC-reID and CUHK03 show that our method can significantly improve the performances and outperform most mainstream methods.
Fenhua WangBo ZhaoChao HuangYouqi Yan
Yongjie WangWei ZhangYanyan Liu
Penggao LiuMingjing AiGuozhi Shan
Ke HanLong JinJunpeng YangZongwang Lv