JOURNAL ARTICLE

Comparative Performance of Retrieval Augmented Generation Tourism Chatbots

Amar Al FariziPrimandani ArsiPungkas Subarkah

Year: 2026 Journal:   Indonesian Journal of Innovation Studies Vol: 27 (1)   Publisher: Universitas Muhammadiyah Sidoarjo

Abstract

General Background: The rapid adoption of artificial intelligence in smart tourism has increased the use of contextual chatbots to deliver destination information efficiently. Specific Background: However, tourism chatbots based on Large Language Models frequently encounter information hallucination, reducing reliability when handling dynamic and local tourism data. Knowledge Gap: Existing studies mainly focus on rule-based or single-model chatbot implementations and provide limited comparative evaluation of Retrieval Augmented Generation configurations combining embedding models and Large Language Models. Aims: This study aims to comparatively evaluate multiple Retrieval Augmented Generation configurations to identify the most suitable combination for contextual tourism chatbots and to analyze differences between large multilingual and small monolingual embedding models using a local tourism dataset. Results: Experimental evaluation using data from 49 tourist destinations in Banyumas Regency shows that the Multilingual-E5-Large embedding model consistently achieves perfect Precision, Recall, and F1-Score across all tested Large Language Models. The combination of Multilingual-E5-Large and GPT-4.1-Mini demonstrates the most balanced performance, achieving a BERTScore F1 of 0.7515 with an average response time of 1.555 seconds. Novelty: This research provides a systematic comparative assessment of embedding capacity and Large Language Model selection within a unified Retrieval Augmented Generation framework for tourism chatbots. Implications: The findings offer practical guidance for selecting model configurations that ensure accurate retrieval, high-quality responses, and efficient system performance in contextual tourism information services. Highlights • Multilingual embedding models deliver consistently higher retrieval accuracy across all tested configurations• GPT-4.1-Mini produces the most balanced generative quality and response latency• Embedding model selection plays a more decisive role than language model variation Keywords Retrieval Augmented Generation; Tourism Chatbot; Large Language Model; Embedding Model; Comparative Evaluation

Keywords:
Tourism Selection (genetic algorithm) Language model Embedding Focus (optics) Implementation Chatbot Quality (philosophy)

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.83
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

AI in Service Interactions
Physical Sciences →  Computer Science →  Artificial Intelligence
Information Retrieval and Data Mining
Physical Sciences →  Computer Science →  Information Systems
Digital Marketing and Social Media
Social Sciences →  Social Sciences →  Sociology and Political Science
© 2026 ScienceGate Book Chapters — All rights reserved.