Abstract

Multilingual text-video retrieval methods have improved significantly in recent years, but the performance for languages other than English still lags. We propose a Cross-Lingual Cross-Modal Knowledge Distillation method to improve multilingual text-video retrieval. Inspired by the fact that English text-video retrieval outperforms other languages, we train a student model using input text in different languages to match the cross-modal predictions from teacher models using input text in English. We propose a cross entropy based objective which forces the distribution over the student's text-video similarity scores to be similar to those of the teacher models. We introduce a new multilingual video dataset, Multi-YouCook2, by translating the English captions in the YouCook2 video dataset to 8 other languages. Our method improves multilingual text-video retrieval performance on Multi-YouCook2 and several other datasets such as Multi-MSRVTT and VATEX. We also conducted an analysis on the effectiveness of different multilingual text models as teachers.

Keywords:
Computer science Natural language processing Artificial intelligence Modal Text retrieval Information retrieval Similarity (geometry) Video retrieval Image (mathematics)

Metrics

5
Cited By
0.91
FWCI (Field Weighted Citation Impact)
30
Refs
0.69
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Cross-Lingual Cross-Modal Consolidation for Effective Multilingual Video Corpus Moment Retrieval

Jiaheng LiuTan YuHanyu PengMingming SunPing Li

Journal:   Findings of the Association for Computational Linguistics: NAACL 2022 Year: 2022 Pages: 1854-1862
JOURNAL ARTICLE

CL2CM: Improving Cross-Lingual Cross-Modal Retrieval via Cross-Lingual Knowledge Transfer

Yabing WangFan WangJianfeng DongHao Luo

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2024 Vol: 38 (6)Pages: 5651-5659
JOURNAL ARTICLE

On cross-lingual retrieval with multilingual text encoders

Robert LitschkoIvan VulićSimone Paolo PonzettoGoran Glavašš

Journal:   Information Retrieval Year: 2022 Vol: 25 (2)Pages: 149-183
DISSERTATION

Modeling Cross-Lingual Knowledge in Multilingual Information Retrieval Systems

Huang, Zhiqi

University:   University of Massachusetts (UMass) Amherst Year: 2025
© 2026 ScienceGate Book Chapters — All rights reserved.