JOURNAL ARTICLE

Software Keyphrase Extraction with Domain-Specific Features

Abstract

Despite the fact that keyphrase is widely used as a brief summary to represent documents, most keyphrase extraction is only focused on arbitrary text. However, many document types have specific behavior which require particular pre-processing in order to extract keyphrases. In software domain, keyphrases can only be extracted by utilizing reverse-engineering approach and applying several conversion rules. This paper proposes a mechanism to extract software keyphrases with domain-specific features. For our case study, our proposed method is applied to Java Archive, a distributional form of Java binaries. Besides pre-processing and conversion rules, our method also utilizes the combination of supervised and unsupervised keyphrase extraction approach to exploit the benefits of both approaches. Furthermore, in order to extract keyphrase pattern more accurately, software-related features are also incorporated besides standard keyphrase extraction features. These features are software structure, software-related natural language text, and software term association. Based on overall evaluation, our proposed method yields moderate R-precision. Thus, our approach is quite considerable to be applied for extracting software keyphrase.

Keywords:
Computer science Software Java Domain (mathematical analysis) Exploit Artificial intelligence Natural language processing Data mining Programming language

Metrics

3
Cited By
0.56
FWCI (Field Weighted Citation Impact)
23
Refs
0.88
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Software Engineering Research
Physical Sciences →  Computer Science →  Information Systems
© 2026 ScienceGate Book Chapters — All rights reserved.