JOURNAL ARTICLE

Software Defect Prediction Using Deep Semantic Feature Learning

Abstract

The software defect prediction (SDP) methodology has the potential to increase software reliability by foreseeing any ominous problems in its source code. Nevertheless, as recent research has demonstrated, creating defective prediction models is a challenging issue. Several research methods for predicting source code defects have been proposed over time. However, the majority of previous research has concentrated on traditional feature extraction and modelling. Traditional methods frequently fail to locate the contextual information needed to build reliable prediction deep learning models in source code files. On the other hand, defect prediction semantic feature techniques have only lately changed and advanced. Using such techniques, it would be possible to forecast suspicious problems by automatically extracting contextual information from source code files. We suggest employing deep learning, a potent representation-learning approach, to fill the gap between defect prediction data and semantics. In order to automatically extract semantic properties from word vectors generated by program abstract syntax trees (AST) for file-level defect prediction models and source code updates for change-level defect prediction models, we use a Long-Short Term Memory (LSTM) model. Two tasks—one for defect prediction at the file level (within-project defect prediction and cross-project defect prediction) and one for defect prediction at the change level (within-project defect prediction and cross-project defect prediction)—were used to assess our methodology. Our findings demonstrate that LSTM-AST-based semantic features can greatly enhance fault identification tasks. The findings demonstrate that the deep learning model LSTM-AST is a practical choice for medium to low sparsity ratio datasets and that, in comparison to other machine models, it performs typically better in high and extremely high sparsity ratio datasets.

Keywords:
Computer science Source code Artificial intelligence Deep learning Software bug Semantics (computer science) Machine learning Syntax Feature (linguistics) Natural language processing Predictive modelling Software Code (set theory) Data mining Programming language Set (abstract data type)

Metrics

2
Cited By
1.24
FWCI (Field Weighted Citation Impact)
20
Refs
0.83
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Software Engineering Research
Physical Sciences →  Computer Science →  Information Systems
Software Reliability and Analysis Research
Physical Sciences →  Computer Science →  Software
Software System Performance and Reliability
Physical Sciences →  Computer Science →  Computer Networks and Communications
© 2026 ScienceGate Book Chapters — All rights reserved.