JOURNAL ARTICLE

Syntax-Aware Retrieval Augmented Code Generation

Abstract

Neural code generation models are nowadays widely adopted to generate code from natural language descriptions automatically. Recently, pre-trained neural models equipped with token-level retrieval capabilities have exhibited great potentials in neural machine translation. However, applying them directly to code generation experience challenges: the use of the retrieval-based mechanism inevitably introduces extraneous noise to the generation process, resulting in even syntactically incorrect code. Computationally, such models necessitate frequent searches of the cached datastore, which turns out to be time-consuming. To address these issues, we propose kNN-TRANX, a token-level retrieval augmented code generation method. kNN-TRANX allows for searches in smaller datastores tailored for the code generation task. It leverages syntax constraints for the retrieval of datastores, which reduces the impact of retrieve noise. We evaluate kNN-TRANX on two public datasets and the experimental results confirm the effectiveness of our approach.

Keywords:
Computer science Code generation Security token Code (set theory) Syntax Machine translation Artificial intelligence Natural language generation Natural language processing Machine learning Natural language Programming language

Metrics

7
Cited By
1.79
FWCI (Field Weighted Citation Impact)
35
Refs
0.85
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Software Engineering Research
Physical Sciences →  Computer Science →  Information Systems
© 2026 ScienceGate Book Chapters — All rights reserved.