Visible infrared human body recognition (VI ReID) is a challenging task for complex modal change retrieval. Existing methods usually focus on extracting discriminative visual features, while ignoring the reliability and commonness of visual features between different modes. In this paper, we propose a new deep learning framework, called multi-scale local progressive transformers (MLT), for effective VI-ReID. In order to reduce the negative impact of modal gap, we first take the gray image as an auxiliary mode, take the Transformer model as the benchmark, and propose a progressive learning strategy. The sea attention mechanism is fused with the dilateformer to further improve the discrimination ability of reliable features, and its feasibility is increased through ablation experiments.
Zifei QinPeishun LiuYibei LiuHaiping DuanLi Fei-FeiHan Wang
Yue PanMeiyu LiangZhengnan LiuJinjin LiJiawei Zhu
Tiezhu ZhaoXiaolun LiangKejing HeQiuhong YangZiliang Ren