Abstract Person re-identification (Re-ID) algorithms can retrieve the same pedestrian’s images from an image gallery captured by multiple cameras when given a pedestrian image. Due to changes in pedestrian postures, illuminations, and perspectives, it remains a significant challenge to improve the accuracy of person re-identification. Although the attention mechanism can alleviate some of these issues, it causes attention-based methods to pay excessive attention to features in the most salient areas of images while ignoring discriminant features outside the most salient areas, resulting in the insufficient discriminability of features extracted by attention-based methods. For this purpose, we propose a Multi-level Salient Feature Mining Network (MSFM-Net). First, by embedding attention modules in ResNet50, the model extract the most salient pedestrian feature maps. Second, the model uses two sub-salient feature mining branches to extract the second-level and third-level salient feature maps (collectively referred to as sub-salient feature maps). Third, the model uses the feature maps fusion module to combine the most salient feature maps with sub-salient feature maps to obtain the fused salient feature maps. Finally, the model pools the fused salient feature maps to produce more discriminant pedestrian representations. The results of two benchmark datasets demonstrate that MSFM-Nets performance reaches the current advanced level.
Haishun DuZhaoyang LiPanting LiuLinbing HeDongdong Huo
Huiyan WuMing XinFang WenHai‐Miao HuZihao Hu
Yunzuo ZhangWeili KangYameng LiuPengfei Zhu