JOURNAL ARTICLE

Cross-Lingual Cross-Modal Retrieval With Noise-Robust Fine-Tuning

Rui CaiJianfeng DongTianxiang LiangYonghui LiangYabing WangXun YangXun WangMeng Wang

Year: 2024 Journal:   IEEE Transactions on Knowledge and Data Engineering Vol: 36 (11)Pages: 5860-5873   Publisher: IEEE Computer Society

Abstract

Cross-lingual cross-modal retrieval aims at leveraging human-labeled annotations in a source language to construct cross-modal retrieval models for a new target language, due to the lack of manually-annotated dataset in low-resource languages (target languages). Contrary to the growing developments in the field of monolingual cross-modal retrieval, there has been less research focusing on cross-modal retrieval in the cross-lingual scenario. A straightforward method to obtain target-language labeled data is translating source-language datasets utilizing Machine Translations (MT). However, as MT is not perfect, it tends to introduce noise during translation, rendering textual embeddings corrupted and thereby compromising the retrieval performance. To alleviate this, we propose Noise-Robust Fine-tuning (NRF) which tries to extract clean textual information from a possibly noisy target-language input with the guidance of its source-language counterpart. Besides, contrastive learning involving different modalities are performed to strengthen the noise-robustness of our model. Different from traditional cross-modal retrieval methods which only employ image/video-text paired data for fine-tuning, in NRF, selected parallel data plays a key role in improving the noise-filtering ability of our model. Extensive experiments are conducted on three video-text and image-text retrieval benchmarks across different target languages, and the results demonstrate that our method significantly improves the overall performance without using any image/video-text paired data on target languages.

Keywords:
Computer science Modal Noise (video) Cross-correlation Artificial intelligence Mathematics Materials science

Metrics

5
Cited By
3.19
FWCI (Field Weighted Citation Impact)
84
Refs
0.88
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Image Retrieval and Classification Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning

Yabing WangJianfeng DongTianxiang LiangMinsong ZhangRui CaiXun Wang

Journal:   Proceedings of the 30th ACM International Conference on Multimedia Year: 2022 Pages: 422-433
JOURNAL ARTICLE

CL2CM: Improving Cross-Lingual Cross-Modal Retrieval via Cross-Lingual Knowledge Transfer

Yabing WangFan WangJianfeng DongHao Luo

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2024 Vol: 38 (6)Pages: 5651-5659
JOURNAL ARTICLE

Noise-Robust Generative Hashing for Cross-Modal Retrieval

Zequn WangTianshi WangFengling LiJingjing LiLei Zhu

Journal:   ACM Transactions on Multimedia Computing Communications and Applications Year: 2025
© 2026 ScienceGate Book Chapters — All rights reserved.