Quanjun ZhangChunrong FangYi ZhengYaxin ZhangYuan ZhaoRubing HuangJianyi ZhouYun YangTao ZhengZhenyu Chen
Unit testing validates the correctness of the units of the software system under test and serves as the cornerstone in improving software quality and reliability. To reduce manual efforts in writing unit tests, some techniques have been proposed to generate test assertions automatically, including deep learning (DL)-based, retrieval-based, and integration-based ones. Among them, recent integration-based approaches inherit from both DL-based and retrieval-based approaches and are considered state-of-the-art. Despite being promising, such integration-based approaches suffer from inherent limitations, such as retrieving assertions with lexical matching while ignoring meaningful code semantics, and generating assertions with a limited training corpus. In this paper, we propose a novel Retri eval-Augmented Deep Assertion Gen eration approach, namely RetriGen, based on a hybrid assertion retriever and a pre-trained language model (PLM)-based assertion generator. Given a focal-test, RetriGen first builds a hybrid assertion retriever to search for the most relevant test-assert pair from external codebases. The retrieval process takes both lexical similarity and semantical similarity into account via a token-based and an embedding-based retriever, respectively. RetriGen then treats assertion generation as a sequence-to-sequence task and designs a PLM-based assertion generator to predict a correct assertion with historical test-assert pairs and the retrieved external assertion. Although our concept is general and can be adapted to various off-the-shelf encoder-decoder PLMs, we implement RetriGen to facilitate assertion generation based on the recent CodeT5 model. We conduct extensive experiments to evaluate RetriGen against six state-of-the-art approaches across two large-scale datasets and two metrics. The experimental results demonstrate that RetriGen achieves 57.66% and 73.24% in terms of accuracy and CodeBLEU, outperforming all baselines with an average improvement of 50.66% and 14.14%, respectively. Furthermore, RetriGen generates 1598 and 1818 unique correct assertions that all baselines fail to produce, 3.71X and 4.58X more than the most recent approach EditAS . We also demonstrate that adopting other PLMs can provide substantial advancement, e.g., four additionally-utilized PLMs outperform EditAS by 7.91% \(\sim\) 12.70% accuracy improvement, indicating the generalizability of RetriGen. Overall, our study highlights the promising future of fine-tuning off-the-shelf PLMs to generate accurate assertions by incorporating external knowledge sources.
Hongyan LiWeifeng SunMeng YanLing XuQiang LiXiaohong ZhangHongyu Zhang
Weifeng SunHongyan LiMeng YanYan LeiHongyu Zhang
Quanjun ZhangChunrong FangYi ZhengRuixiang QianShengcheng YuYuan ZhaoJianyi ZhouYunsheng YangTao ZhengZhenyu Chen
Ting JiangDeqing WangFuzhen ZhuangRuobing XieFeng Xia
Shuwen DengPaul PrasseDavid R. ReichTobias SchefferLena A. Jäger