DISSERTATION

Anomaly Detection Using Graph Neural Network

Abstract

Detecting malicious behavior is becoming increasingly crucial as the internet becomes more prevalent. This problem can be formulated as an anomaly detection task on provenance data, where attacks are detectable as anomalies in the behavior of the system. The availability of system-level data in comparison to network data is quite limited and so is the research carried out on system-level logs. However, monitoring the operating system's processes during program execution and identifying anomalous behavior in system calls can be beneficial since it can provide broad coverage and generality, as a variety of malicious applications could be identified. Furthermore, logs like system processes and events are provenance data- a graph that describes the relationship between all the elements that contributed to the creation of the data, making use of a Graph Neural Network (GNN) better suited for the task. Moreover, such data may contain metadata, which in general tends to be complex and make feature engineering more difficult resulting in limited usage of such features. In this thesis, we address these issues by first utilizing the graph-like structure of logs, in which processes enact events and generate additional processes. Then we use a graph neural network to create representations of each event, encoding information about their neighboring events in a way that is unsupervised. The second is to make use of complex features such as command arguments which vary widely and cannot be used in the presented format as features in typical machine learning algorithms. If these features are instead encoded using a system composed of transformer and Variational Auto Encoder models, they can then be used in other algorithms such as a GNN or anomaly detector. These two approaches combined improve anomaly detection AUCROC for the BETH dataset by around 8 percent as compared to the manually engineered features alone.

Keywords:
Computer science Metadata Anomaly detection Graph Data mining Artificial neural network Generality Feature engineering Task (project management) Feature (linguistics) Artificial intelligence Machine learning Deep learning Theoretical computer science World Wide Web Engineering

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Software System Performance and Reliability
Physical Sciences →  Computer Science →  Computer Networks and Communications
Neural Networks and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Data Processing Techniques
Physical Sciences →  Engineering →  Control and Systems Engineering
© 2026 ScienceGate Book Chapters — All rights reserved.