JOURNAL ARTICLE

Retrieval-based neural source code summarization

Abstract

Source code summarization aims to automatically generate concise summaries of source code in natural language texts, in order to help developers better understand and maintain source code. Traditional work generates a source code summary by utilizing information retrieval techniques, which select terms from original source code or adapt summaries of similar code snippets. Recent studies adopt Neural Machine Translation techniques and generate summaries from code snippets using encoder-decoder neural networks. The neural-based approaches prefer the high-frequency words in the corpus and have trouble with the low-frequency ones. In this paper, we propose a retrieval-based neural source code summarization approach where we enhance the neural model with the most similar code snippets retrieved from the training set. Our approach can take advantages of both neural and retrieval-based techniques. Specifically, we first train an attentional encoder-decoder model based on the code snippets and the summaries in the training set; Second, given one input code snippet for testing, we retrieve its two most similar code snippets in the training set from the aspects of syntax and semantics, respectively; Third, we encode the input and two retrieved code snippets, and predict the summary by fusing them during decoding. We conduct extensive experiments to evaluate our approach and the experimental results show that our proposed approach can improve the state-of-the-art methods.

Keywords:
Computer science Automatic summarization Source code Code (set theory) Set (abstract data type) Encoder Information retrieval Artificial intelligence Natural language processing Snippet Machine translation Code generation Semantics (computer science) Programming language Key (lock)

Metrics

235
Cited By
46.42
FWCI (Field Weighted Citation Impact)
71
Refs
1.00
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Software Engineering Research
Physical Sciences →  Computer Science →  Information Systems
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Bi-LSTM-Based Neural Source Code Summarization

Sarah AljumahLamia Berriche

Journal:   Applied Sciences Year: 2022 Vol: 12 (24)Pages: 12587-12587
JOURNAL ARTICLE

Towards Retrieval-Based Neural Code Summarization: A Meta-Learning Approach

Ziyi ZhouHuiqun YuGuisheng FanZijie HuangKang Yang

Journal:   IEEE Transactions on Software Engineering Year: 2023 Vol: 49 (4)Pages: 3008-3031
© 2026 ScienceGate Book Chapters — All rights reserved.