JOURNAL ARTICLE

IRAG: Iterative Retrieval Augmented Generation for SLU

Abstract

This paper proposes an iterative retrieval augmented generation (RAG) approach to improve spoken language understanding (SLU) capabilities. First, speech retrieval over the training set is performed using a pretrained automatic speech recognition encoder. The corresponding texts and intent labels are then formulated as prompts to guide the SLU decoder, with an added prompt attention mechanism to strengthen attention between generation and prompts. Iterative search and generation occurs within 3 iterations, or earlier exit if similarity scores do not improve. Experiments demonstrate the proposed RAG approach substantially outperforms conventional end-to-end and cascaded SLU models in intent prediction from speech. This highlights the efficacy of incorporating relevant external knowledge through retrieval-based prompting to enhance SLU systems. The iterative process allows progressive refinement of predictions. Overall, this work shows promise for advancing SLU via iterative RAG.

Keywords:
Computer science Encoder Iterative refinement Set (abstract data type) Iterative and incremental development Similarity (geometry) Iterative method Process (computing) Artificial intelligence Decoding methods Speech recognition Information retrieval Natural language processing Algorithm Image (mathematics)

Metrics

5
Cited By
3.19
FWCI (Field Weighted Citation Impact)
25
Refs
0.88
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.