JOURNAL ARTICLE

Improve Code Summarization via Prompt-Tuning CodeT5

Huanzhen LI

Year: 2023 Journal:   Wuhan University Journal of Natural Sciences Vol: 28 (6)Pages: 474-482   Publisher: Springer Science+Business Media

Abstract

Code comments are crucial in software engineering, aiding in program maintenance and code reuse. The process of generating clear and descriptive code comments, outlining code functionality, is called code summarization. Existing code summarization methods are typically trained using transformer-based models. However, these trained models often possess limited parameters and lack specific training tasks, hindering their ability to capture code semantics effectively. This paper uses a high-capacity pre-trained model, CodeT5, for code summarization. CodeT5 is designed with an encoder-decoder architecture that excels in code summarization tasks. Furthermore, we adopt a novel paradigm, "pre-train, prompt, predict", to unlock the knowledge embedded within CodeT5. We devise a prompt template to convert input code into code prompts and fine-tune CodeT5 with these prompts—a process we term prompt tuning. Our effectiveness experiments demonstrate that prompt tuning CodeT5 with only 40% of the dataset can achieve comparable performance to fine-tuning CodeT5 with 100% of the dataset. This means our approach is applicable in few-shot learning scenarios. Additionally, our prompt learning method is not sensitive to the size of the tuning dataset. Our practicality experiments show that the performance of prompt-tuned CodeT5 far surpasses that of transformer-based models trained on code-comment datasets collected from Stack Overflow.

Keywords:
Computer science Automatic summarization Encoder Transformer Code (set theory) Source code Code review Artificial intelligence Reuse Code reuse Process (computing) KPI-driven code analysis Software Static program analysis Machine learning Programming language Software development

Metrics

4
Cited By
2.47
FWCI (Field Weighted Citation Impact)
23
Refs
0.90
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Software Engineering Research
Physical Sciences →  Computer Science →  Information Systems
Software Testing and Debugging Techniques
Physical Sciences →  Computer Science →  Software
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.