Our work focuses on exploring the emerging field of cross-modal vehicle re-identification. Achieving accurate cross-modal vehicle re-identification requires a network that can capture local details from two different modality images while effectively fusing their valid information. However, existing methods only consider extracting high-level semantics, leading to a loss of fine-grained details and imprecise identification. Additionally, insufficient attention has been paid to effective information in different modalities, as cross-modality interaction has not been thoroughly explored. To address these issues, we propose a new cross-modal vehicle re-identification network consisting of a multi-scale feature fusion module and a cross-modal attention module. Specifically, the multiscale feature fusion module captures both global high-level semantics and local details by integrating multi-scale information in the feature extraction process, reducing the loss of local details. The cross-modal attention module explores valid information from different modalities and achieves feature-level fusion. We conducted experiments on the RGBNT100 cross-modal vehicle re-identification dataset to verify the proposed method's effectiveness.
Aihua ZhengXianmin LinJiacheng DongWenzhong WangJin TangBin Luo
Geyan SuZhonghua SunKebin JiaJinchao Feng
Lihui LüRifan WangZhencong ChenJiaqi Chen