JOURNAL ARTICLE

Dual-Mix for Cross-Modal Retrieval with Noisy Labels

Abstract

Cross-modal retrieval with deep neural networks heavily relies on accurate annotation. However, existing methods may easily suffer from the scarcity and validity of annotations due to the expensive cost of manual labeling. In addition, it is inevitable that noisy labels are imposed during labeling. To this end, it is worthwhile to explore the potential of noisy labels in cross-modal retrieval. In this work, we propose a novel framework entitled Dual-Mix for Cross-Modal Retrieval with noisy labels (DMCM). It consists of two components, which are mixing the robust loss functions and mixing augmentation for noisy samples. In the first mixing stage, the normalized generalized cross entropy and mean absolute error are combined to boost each other. Then, after separating clean and noisy samples by Beta Mixture Model, we mix these samples via augmentation to further address the scarcity of labeled samples. Extensive experiments demonstrate the significant superiority of our DMCM.

Keywords:
Modal Computer science Artificial intelligence Dual (grammatical number) Deep neural networks Cross entropy Cross-validation Scarcity Mixing (physics) Machine learning Pattern recognition (psychology) Artificial neural network Chemistry

Metrics

2
Cited By
1.06
FWCI (Field Weighted Citation Impact)
37
Refs
0.64
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Machine Learning and Data Classification
Physical Sciences →  Computer Science →  Artificial Intelligence
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.