JOURNAL ARTICLE

Large Language Models and Multimodal Retrieval for Visual Word Sense Disambiguation

Abstract

Visual Word Sense Disambiguation (VWSD) is a novel challenging task with the goal of retrieving an image among a set of candidates, which better represents the meaning of an ambiguous word within a given context. In this paper, we make a substantial step towards unveiling this interesting task by applying a varying set of approaches. Since VWSD is primarily a text-image retrieval task, we explore the latest transformer-based methods for multimodal retrieval. Additionally, we utilize Large Language Models (LLMs) as knowledge bases to enhance the given phrases and resolve ambiguity related to the target word. We also study VWSD as a unimodal problem by converting to text-to-text and image-to-image retrieval, as well as question-answering (QA), to fully explore the capabilities of relevant models. To tap into the implicit knowledge of LLMs, we experiment with Chain-of-Thought (CoT) prompting to guide explainable answer generation. On top of all, we train a learn to rank (LTR) model in order to combine our different modules, achieving competitive ranking results. Extensive experiments on VWSD demonstrate valuable insights to effectively drive future directions.

Keywords:
Computer science Natural language processing Ambiguity Ranking (information retrieval) Artificial intelligence Set (abstract data type) Rank (graph theory) Question answering Task (project management) Transformer Information retrieval Language model Context (archaeology) Word (group theory) Linguistics

Metrics

3
Cited By
0.55
FWCI (Field Weighted Citation Impact)
29
Refs
0.63
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Leveraging large language models for word sense disambiguation

Jung H. YaeNolan C. SkellyNeil RanlyPhillip M. LaCasse

Journal:   Neural Computing and Applications Year: 2024 Vol: 37 (6)Pages: 4093-4110
JOURNAL ARTICLE

Correction: Leveraging large language models for word sense disambiguation

Jung H. YaeNolan C. SkellyNeil RanlyPhillip M. LaCasse

Journal:   Neural Computing and Applications Year: 2025 Vol: 37 (10)Pages: 7449-7450
JOURNAL ARTICLE

Word sense disambiguation for cross-language information retrieval

Mary Xiaoyong LiuTed DiamondAnne R. Diekema

Journal:   Utah State Research and Scholarship (Utah State University) Year: 2000 Vol: 5 Pages: 35-40
© 2026 ScienceGate Book Chapters — All rights reserved.