JOURNAL ARTICLE

Enhanced-Similarity Attention Fusion for Unsupervised Cross-Modal Hashing Retrieval

Mingyong LiMingyuan Ge

Year: 2025 Journal:   Data Science and Engineering Vol: 10 (2)Pages: 258-276   Publisher: Springer Science+Business Media

Abstract

Abstract Although the fact that current methods have some effects, unsupervised cross-modal hashing methods still face several common challenges. First of all, the text features that have been collected from text data are not comprehensive enough to provide sufficient guidance for building textual modal similarity matrices. Secondly, the fusion of similarity matrices from different modalities lacks adaptability, leading to a less accurate final similarity matrix. This work suggests Enhanced Similarity Attention Fusion Hashing (ESAFH) as a remedy for these problems. Firstly, we construct a text encoder to enrich text features, an adjacency matrix is built to represent the association relationship between pairs of samples. Additionally, it is thought that features can be extracted from the sample and its semantic neighbor samples to enhance text features. Furthermore, we enhance the original similarity matrix by incorporating related information. This step aims to improve the accuracy of similarity estimation by considering the enriched text features obtained in the previous step. Finally, we introduce an enhanced attention fusion mechanism. This mechanism adaptively fuses the similarity matrices from different modalities, creating a unified inter-modal similarity matrix. This fused matrix guides the learning of hash functions by preserving the most relevant information from each modality. Through comprehensive experiments on the three popular datasets, the suggested ESAFH method is thoroughly assessed. The findings show that on these datasets, ESAFH performs satisfactorily in cross-modal retrieval tasks. In conclusion, by boosting text features, improving the similarity matrix, and utilizing an attention fusion mechanism, ESAFH solves the shortcomings of current methods.

Keywords:
Computer science Hash function Similarity (geometry) Modal Fusion Locality-sensitive hashing Artificial intelligence Pattern recognition (psychology) Information retrieval Data mining Machine learning Hash table

Metrics

5
Cited By
23.87
FWCI (Field Weighted Citation Impact)
38
Refs
0.97
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.