JOURNAL ARTICLE

Transformer-Based Deep Hashing Method for Multi-Scale Feature Fusion

Abstract

The deep image hashing aims to map the input image into simply binary hash codes via deep neural networks. Motivated by the recent advancements of Vision Transformers (ViT), many deep hashing methods based on ViT have been proposed. Nevertheless, the ViT has enormous number of model parameters and high computational complexity. Moreover, the last layer of the ViT outputs only the classification tokens as image feature vectors, while the rest of the vectors are discarded. This results in the inefficiency of the model computation and the neglect of useful image information. Therefore, this paper proposes a Transformer-based deep hashing method for multi-scale feature fusion (TDH). Specifically, we use a hierarchical Transformer backbone to capture both global and local features of images. The hierarchical Transformer utilizes a local self-attention mechanism to process image blocks in parallel, which reduces computational complexity and promotes computational efficiency. Multi-scale feature fusion module captures all image feature vectors of the hierarchical Transformer output to obtain more enriched image feature information. We perform comprehensive experiments on three widely-studied datasets: CIFAR-10, NUS-WIDE and IMAGENET. The experimental results demonstrate that the proposed method in this paper indicates superior results compared to the existing state-of-the-art work. Source code is available https://github.com/shuaichaochao/TDH.

Keywords:
Computer science Hash function Artificial intelligence Fusion Transformer Scale (ratio) Pattern recognition (psychology) Engineering Computer security Electrical engineering Cartography Voltage

Metrics

8
Cited By
1.46
FWCI (Field Weighted Citation Impact)
23
Refs
0.78
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Retrieval and Classification Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Res2former: A multi-scale fusion based transformer feature extraction method

Bojun XieYanjie WangShaocong GuoJunfen Chen

Journal:   Journal of Visual Communication and Image Representation Year: 2025 Vol: 112 Pages: 104546-104546
BOOK-CHAPTER

Multi-feature Fusion-Based Central Similarity Deep Supervised Hashing

Chao HeHongxi WeiKai Lü

Lecture notes in computer science Year: 2023 Pages: 335-345
JOURNAL ARTICLE

Image Recognition based on Multi-scale Feature Fusion Transformer

Zhefeng ZhuKe QiWenbin ChenYicong ZhouPeiyue LiZhenxian Liu

Journal:   2022 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA) Year: 2022 Pages: 7-13
JOURNAL ARTICLE

Multi-scale fusion transformer based weakly supervised hashing learning for instance retrieval

Yuanhai LvChen JiaoWanqing ZhaoWei ZhaoZiyu GuanXiaofei He

Journal:   International Journal of Machine Learning and Cybernetics Year: 2023 Vol: 14 (12)Pages: 4431-4442
© 2026 ScienceGate Book Chapters — All rights reserved.