JOURNAL ARTICLE

Progressive Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval

Cheng DengXinxun XuHao WangMuli YangDacheng Tao

Year: 2020 Journal:   IEEE Transactions on Image Processing Vol: 29 Pages: 8892-8902   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Zero-shot sketch-based image retrieval (ZS-SBIR) is a specific cross-modal retrieval task that involves searching natural images through the use of free-hand sketches under the zero-shot scenario. Most previous methods project the sketch and image features into a low-dimensional common space for efficient retrieval, and meantime align the projected features to their semantic features (e.g., category-level word vectors) in order to transfer knowledge from seen to unseen classes. However, the projection and alignment are always coupled; as a result, there is a lack of alignment that consequently leads to unsatisfactory zero-shot retrieval performance. To address this issue, we propose a novel progressive cross-modal semantic network. More specifically, it first explicitly aligns the sketch and image features to semantic features, then projects the aligned features to a common space for subsequent retrieval. We further employ cross-reconstruction loss to encourage the aligned features to capture complete knowledge about the two modalities, along with multi-modal Euclidean loss that guarantees similarity between the retrieval features from a sketch-image pair. Extensive experiments conducted on two popular large-scale datasets demonstrate that our proposed approach outperforms state-of-the-art competitors to a remarkable extent: by more than 3% on the Sketchy dataset and about 6% on the TU-Berlin dataset in terms of retrieval accuracy.

Keywords:
Computer science Image retrieval Sketch Artificial intelligence Similarity (geometry) Visual Word Projection (relational algebra) Image (mathematics) Information retrieval Modal Pattern recognition (psychology) Algorithm

Metrics

75
Cited By
4.51
FWCI (Field Weighted Citation Impact)
67
Refs
0.95
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.