JOURNAL ARTICLE

Improving Deep Assertion Generation via Fine-Tuning Retrieval-Augmented Pre-trained Language Models

Quanjun ZhangChunrong FangYi ZhengYaxin ZhangYuan ZhaoRubing HuangJianyi ZhouYun YangTao ZhengZhenyu Chen

Year: 2025 Journal:   ACM Transactions on Software Engineering and Methodology   Publisher: Association for Computing Machinery

Abstract

Unit testing validates the correctness of the units of the software system under test and serves as the cornerstone in improving software quality and reliability. To reduce manual efforts in writing unit tests, some techniques have been proposed to generate test assertions automatically, including deep learning (DL)-based, retrieval-based, and integration-based ones. Among them, recent integration-based approaches inherit from both DL-based and retrieval-based approaches and are considered state-of-the-art. Despite being promising, such integration-based approaches suffer from inherent limitations, such as retrieving assertions with lexical matching while ignoring meaningful code semantics, and generating assertions with a limited training corpus. In this paper, we propose a novel Retri eval-Augmented Deep Assertion Gen eration approach, namely RetriGen, based on a hybrid assertion retriever and a pre-trained language model (PLM)-based assertion generator. Given a focal-test, RetriGen first builds a hybrid assertion retriever to search for the most relevant test-assert pair from external codebases. The retrieval process takes both lexical similarity and semantical similarity into account via a token-based and an embedding-based retriever, respectively. RetriGen then treats assertion generation as a sequence-to-sequence task and designs a PLM-based assertion generator to predict a correct assertion with historical test-assert pairs and the retrieved external assertion. Although our concept is general and can be adapted to various off-the-shelf encoder-decoder PLMs, we implement RetriGen to facilitate assertion generation based on the recent CodeT5 model. We conduct extensive experiments to evaluate RetriGen against six state-of-the-art approaches across two large-scale datasets and two metrics. The experimental results demonstrate that RetriGen achieves 57.66% and 73.24% in terms of accuracy and CodeBLEU, outperforming all baselines with an average improvement of 50.66% and 14.14%, respectively. Furthermore, RetriGen generates 1598 and 1818 unique correct assertions that all baselines fail to produce, 3.71X and 4.58X more than the most recent approach EditAS . We also demonstrate that adopting other PLMs can provide substantial advancement, e.g., four additionally-utilized PLMs outperform EditAS by 7.91% \(\sim\) 12.70% accuracy improvement, indicating the generalizability of RetriGen. Overall, our study highlights the promising future of fine-tuning off-the-shelf PLMs to generate accurate assertions by incorporating external knowledge sources.

Keywords:
Computer science Assertion Artificial intelligence Natural language processing Programming language

Metrics

2
Cited By
16.49
FWCI (Field Weighted Citation Impact)
42
Refs
0.94
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Software Testing and Debugging Techniques
Physical Sciences →  Computer Science →  Software
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.