DSmith: Compiler Fuzzing through Generative Deep Learning Model with Attention

Haoran Xu; Yongjun Wang; Shuhui Fan; Peidai Xie; Aizhi Liu

doi:10.1109/ijcnn48605.2020.9206911

ScienceGate Book Chapters

JOURNAL ARTICLE

DSmith: Compiler Fuzzing through Generative Deep Learning Model with Attention

Haoran Xu Yongjun Wang Shuhui Fan Peidai Xie Aizhi Liu

Year: 2020 Pages: 1-9

DOI: 10.1109/ijcnn48605.2020.9206911

Get Full-Text PDF Get Analytical Report

Abstract

Compiler fuzzing is a technique to test the functionalities of compiler. It requires well-formed test cases (i.e., programs) that have correct lexicons and syntax to pass the parsing stage of a compiler. Recently, advanced compiler fuzzing methods generate effective test cases by deep neural networks, which learn the language model of regular programs to guarantee test case quality. However, most of these methods fail to capture long-distance dependencies of syntax (e.g., paired curly braces) in a program. As a result, they may generate test cases with syntax errors, which cannot pass the parsing stage to test the compiler functionality. In this paper, we propose a framework, namely DSmith, to capture long-distance dependencies of syntax for a robust test case generation. Specifically, DSmith memorizes the hidden state of each token in a program and leverages the interactions of these hidden states to embed the long-distance dependencies between tokens. It then adopts an encoder-decoder architecture with the embedding of these long-distance dependencies to build a language model of regular programs. Finally, DSmith uses the built language model to generate test cases according to four novel generation strategies, which significantly increase the diversity of test cases. Extensive experiments show that DSmith increases the parsing pass rate of the generated programs by an average of 19% and significantly improves the code coverage of the compiler, compared with state-of-the-art methods. Benefiting from the high pass rate and broad code coverage, DSmith has found eleven brand new bugs in currently supported GCC compiler versions.

Keywords:

Fuzz testing Computer science Compiler Parsing Compiler construction Compiler correctness Programming language Abstract syntax tree Optimizing compiler Code coverage Syntax Code generation Artificial intelligence Dead code elimination Software Operating system Object code

Metrics

Cited By

2.02

FWCI (Field Weighted Citation Impact)

Refs

0.85

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Software Testing and Debugging Techniques

Physical Sciences → Computer Science → Software

Software Engineering Research

Physical Sciences → Computer Science → Information Systems

Software System Performance and Reliability

Physical Sciences → Computer Science → Computer Networks and Communications

DSmith: Compiler Fuzzing through Generative Deep Learning Model with Attention

Abstract

Metrics

Citation History

Topics

Related Documents

Compiler fuzzing through deep learning

GeMuFuzz: Integrating Generative and Mutational Fuzzing with Deep Learning

Compiler Fuzzing

DeepDiffer: Find Deep Learning Compiler Bugs via Priority-guided Differential Fuzzing

Fuzzing software with deep learning