Hierarchical Attention Image-Text Alignment Network For Person Re-Identification

Kajal Kansal; A V Subramanyam; Zheng Wang; Shin’ichi Satoh

doi:10.1109/icmew53276.2021.9455960

ScienceGate Book Chapters

JOURNAL ARTICLE

Hierarchical Attention Image-Text Alignment Network For Person Re-Identification

Kajal Kansal A V Subramanyam Zheng Wang Shin’ichi Satoh

Year: 2021 Vol: 9 Pages: 1-6

DOI: 10.1109/icmew53276.2021.9455960

Get Full-Text PDF Get Analytical Report

Abstract

Description based Person Re-identification (Re-ID) is a crucial cross-modality task that aims at retrieving a specific person for the given textual description. Existing description based Re-ID methods focus on learning robust representations to effectively measure the similarity between the global features of two modalities. However, such global mapping disregards semantic consistencies between local visual and linguistic features. Further, there are major challenges of alignment uncertainty that occur due to poor correspondence of text-image pairs and text complexity arising due to the irrelevant words. Towards this, we propose an end-to-end Hierarchical Attention Image-Text Alignment Network, named as HAITA-Net. Our model comprises of: i) a hierarchical attention alignment network to determine the potential relationships of image content and textual information at different levels, namely, word-patch level, phrase-patch level, and sentence-image level for addressing alignment uncertainty; ii) a new strategy of Term Frequency-Inverse document Frequency thresholding to extract the salient tokens to alleviate the challenge of text complexity. The network is optimized via joint weighted hierarchical attention loss and cross-modal loss in an end-to-end manner. Extensive experiments demonstrate the effectiveness of our method.

Keywords:

Computer science Artificial intelligence Identification (biology) Similarity (geometry) Natural language processing Sentence Pattern recognition (psychology) Focus (optics) Phrase Modality (human–computer interaction) Salient Attention network Image (mathematics)

Metrics

Cited By

0.41

FWCI (Field Weighted Citation Impact)

Refs

0.60

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Video Surveillance and Tracking Methods

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Hierarchical Attention Image-Text Alignment Network For Person Re-Identification

Abstract

Metrics

Citation History

Topics

Related Documents

Cross-Modal Alignment Enhancement Network for Text-to-Image Person Re-Identification

Cross-modal feature learning and alignment network for text–image person re-identification

Multimodal Feature Hierarchical Fusion for Text-Image Person Re-identification

Implicit Alignment-Based Cross-Modal Symbiotic Network for Text-to-Image Person Re-Identification

Fine-grained alignment network and local attention network for person re-identification