In the last few years, person re-identification (re-id) has made significant progress in supervised learning for matching pedestrians across disjoint camera views in surveillance. However, it is infeasible in many new scenes without sufficient labeled images when extending a re-id system. Therefore, unsupervised methods in person re-id tasks are vital for saving labeling costs. However, cross-camera scene variation is a crucial challenge for unsupervised person re-id, such as the occlusion problem. It results in uneven pairwise similarity distributions, which degrade matching performance. To solve this issue, we propose a local manifold consistency learning (LMCL) framework that consists of a context-aware feature embedding network and a camera-aware manifold alignment strategy. To better extract comprehensive features of persons in images, we propose a saliency feature attention algorithm by cropping feature maps into regions and transforming them into context features. We optimize our model based on sub-domain alignment loss to alleviate the effect of cross-camera scene variation, which closes the distance between sub-domains composed of similar samples. Extensive experimental results and ablation experiments verify the effectiveness of our LMCL approach.
Yuxin ZhangTeng ZhuBaopeng Zhang
Yanbing GengYongjian LianFangshu CuiXiaowei ZhangMingliang ZhouG.H. Zhang
Bin YangJun ChenCuiqun ChenMang Ye