Tri-Level Modality-Information Disentanglement for Visible-Infrared Person Re-Identification

Zefeng Lu; Ronghao Lin; Haifeng Hu

doi:10.1109/tmm.2023.3302132

ScienceGate Book Chapters

JOURNAL ARTICLE

Tri-Level Modality-Information Disentanglement for Visible-Infrared Person Re-Identification

Zefeng Lu Ronghao Lin Haifeng Hu

Year: 2023 Journal: IEEE Transactions on Multimedia Vol: 26 Pages: 2700-2714 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/tmm.2023.3302132

Get Full-Text PDF Get Analytical Report

Abstract

Aiming to match the person identity between daytime VISible (VIS) and nighttime Near-InfraRed (NIR) images, VIS-NIR re-identification (Re-ID) has attracted increasing attention due to its wide applications in low-light scenes. However, dramatic modality discrepancies between VIS and NIR images lead to a considerable intra-class gap in the feature space, which impacts identity matching. To bridge the modality gap, we propose a Tri-level Modality-information Disentanglement (TMD) to disentangle modality information at the levels of raw image, features distribution and instance features. Our model consists of three key modules, including Style-Aligned Converter (SAC), Two-Steps Wasserstein Loss (TSWL) and Self-supervised Orthogonal Disentanglement (SOD) to handle the modality information at the three levels. Firstly, aiming at reducing modality discrepancy at image-level, the SAC is introduced to generate style-aligned images by the designed style converter and $\mathcal {A}$ -distance learning approach. The SAC can effectively alleviate the style discrepancy between VIS and NIR images with a negligible increase in model complexity. Secondly, considering the heterogeneity of VIS and NIR feature distribution caused by the structure- and style-misaligned raw images, we propose the TSWL to decrease the VIS-NIR gap at distribution-level by two distribution alignment steps. Specifically, after generating style-consistent images, we eliminate modality-related discrepancy by aligning the distribution between structure-aligned original and generated VIS/NIR images and bridge the modality-unrelated gap by aligning the style-consistent generated VIS-NIR images. Thirdly, focusing on further reducing the modality discrepancy at instance-level, the SOD is presented to construct orthogonal constraints between the extracted modality- and identity-related features. Since the modality-related factors are disentangled from the instance features, the proposed TMD efficiently learns the modality-unrelated and identity-discriminative representations, which are productive to conduct person Re-ID task on the VIS-NIR images. Comprehensive experiments are carried out on two cross-modality pedestrian Re-ID datasets to demonstrate the effectiveness of TMD.

Keywords:

Modality (human–computer interaction) Computer science Feature (linguistics) Artificial intelligence Matching (statistics) Pattern recognition (psychology) Identification (biology) Identity (music) Computer vision Mathematics Physics Statistics

Metrics

Cited By

5.82

FWCI (Field Weighted Citation Impact)

Refs

0.96

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Video Surveillance and Tracking Methods

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Enhancement Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Tri-Level Modality-Information Disentanglement for Visible-Infrared Person Re-Identification

Abstract

Metrics

Citation History

Topics

Related Documents

Hi-CMD: Hierarchical Cross-Modality Disentanglement for Visible-Infrared Person Re-Identification

Cross-modality disentanglement and shared feedback learning for infrared-visible person re-identification

Bidirectional modality information interaction for Visible–Infrared Person Re-identification

Identity Feature Disentanglement for Visible-Infrared Person Re-Identification

FMCNet+: Feature-Level Modality Compensation for Visible-Infrared Person Re-Identification