JOURNAL ARTICLE

Coverage-guided fuzzing for deep reinforcement learning systems

Xiaohui WanTiancheng LiWeibin LinYi CaiZheng Zheng

Year: 2024 Journal:   Journal of Systems and Software Vol: 210 Pages: 111963-111963   Publisher: Elsevier BV

Abstract

While the past decade has witnessed a growing demand for employing deep reinforcement learning (DRL) in various domains to solve real-world problems, the reliability of DRL systems has become more of a concern. In particular, DRL agents are often trained on data from a potentially biased distribution over environmental settings, causing the trained agents to fail in certain cases despite high average-case performance. Hence, it is necessary and urgent to adequately test DRL agents to ensure the reliability of practical DRL systems. However, due to the fundamental difference in the programming paradigm and the development process, traditional software testing methodology cannot be applied directly to DRL systems. Given that, we introduce a novel testing framework for DRL systems, aiming to generate diverse test cases that can drive a DRL system to fail. Specifically, we design, implement and evaluate DRLFuzz, which is a coverage-guided fuzzing (CGF) framework for systematically testing DRL systems. Experimental results demonstrate that DRLFuzz can efficiently discover diverse failures in different DRL systems for various benchmark tasks. Compared with a random search baseline, DRLFuzz can generate 60% more failed cases in general. Additionally, the diversity of failed cases generated by DRLFuzz is increased by 4.6%∼14.1% in terms of mean pairwise distance (MPD). Furthermore, our experiments also indicate that the failed cases generated by DRLFuzz can be utilized to fine-tune the DRL agent to eliminate the failures resulting from inadequate exploration during training and thus improve the reliability of DRL systems.

Keywords:
Fuzz testing Reinforcement learning Computer science Reliability (semiconductor) Pairwise comparison Benchmark (surveying) Artificial intelligence Machine learning Reliability engineering Software Engineering Cartography

Metrics

9
Cited By
5.75
FWCI (Field Weighted Citation Impact)
87
Refs
0.93
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Reinforcement Learning in Robotics
Physical Sciences →  Computer Science →  Artificial Intelligence
Software Testing and Debugging Techniques
Physical Sciences →  Computer Science →  Software
Viral Infectious Diseases and Gene Expression in Insects
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
© 2026 ScienceGate Book Chapters — All rights reserved.