JOURNAL ARTICLE

Zero-Shot Recommendations with Pre-Trained Large Language Models for Multimodal Nudging

Abstract

We present a method for zero-shot recommendation of multimodal non-stationary content that leverages recent advancements in the field of generative AI. We propose rendering inputs of different modalities as textual descriptions and to utilize pre-trained LLMs to obtain their numerical representations by computing semantic embeddings. Once unified representations of all content items are obtained, the recommendation can be performed by computing an appropriate similarity metric between them without any additional learning. We demonstrate our approach on a synthetic multimodal nudging environment, where the inputs consist of tabular, textual, and visual data.

Keywords:
Computer science Modalities Rendering (computer graphics) Generative grammar Metric (unit) Artificial intelligence Zero (linguistics) Generative model Field (mathematics) Natural language processing Similarity (geometry) Information retrieval Machine learning Image (mathematics) Mathematics Linguistics

Metrics

8
Cited By
2.04
FWCI (Field Weighted Citation Impact)
51
Refs
0.86
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Retrieval and Classification Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.