JOURNAL ARTICLE

KnowLog: Knowledge Enhanced Pre-trained Language Model for Log Understanding

Abstract

Logs as semi-structured text are rich in semantic information, making their comprehensive understanding crucial for automated log analysis. With the recent success of pre-trained language models in natural language processing, many studies have leveraged these models to understand logs. Despite their successes, existing pre-trained language models still suffer from three weaknesses. Firstly, these models fail to understand domain-specific terminology, especially abbreviations. Secondly, these models struggle to adequately capture the complete log context information. Thirdly, these models have difficulty in obtaining universal representations of different styles of the same logs. To address these challenges, we introduce KnowLog, a knowledge-enhanced pre-trained language model for log understanding. Specifically, to solve the previous two challenges, we exploit abbreviations and natural language descriptions of logs from public documentation as local and global knowledge, respectively, and leverage this knowledge by designing novel pre-training tasks for enhancing the model. To solve the last challenge, we design a contrastive learning-based pre-training task to obtain universal representations. We evaluate KnowLog by fine-tuning it on six different log understanding tasks. Extensive experiments demonstrate that KnowLog significantly enhances log understanding and achieves state-of-the-art results compared to existing pre-trained language models without knowledge enhancement. Moreover, we conduct additional experiments in transfer learning and low-resource scenarios, showcasing the substantial advantages of KnowLog. Our source code and detailed experimental data are available at https://github.com/LeaperOvO/KnowLog.

Keywords:
Computer science Natural language understanding Terminology Artificial intelligence Natural language processing Leverage (statistics) Language model Exploit Documentation Natural language Data science Machine learning

Metrics

21
Cited By
17.57
FWCI (Field Weighted Citation Impact)
56
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Software System Performance and Reliability
Physical Sciences →  Computer Science →  Computer Networks and Communications
Software Engineering Research
Physical Sciences →  Computer Science →  Information Systems
© 2026 ScienceGate Book Chapters — All rights reserved.