Impact analysis of keyword extraction using contextual word embedding

Qasim Khan; Abdul Shahid; M. Irfan Uddin; Muhammad Roman; Abdullah Alharbi; Wael Alosaimi; Jameel Almalki; Saeed M. Alshahrani

doi:10.7717/peerj-cs.967

ScienceGate Book Chapters

JOURNAL ARTICLE

Impact analysis of keyword extraction using contextual word embedding

Qasim Khan Abdul Shahid M. Irfan Uddin Muhammad Roman Abdullah Alharbi Wael Alosaimi Jameel Almalki Saeed M. Alshahrani

Year: 2022 Journal: PeerJ Computer Science Vol: 8 Pages: e967-e967 Publisher: PeerJ, Inc.

DOI: 10.7717/peerj-cs.967

Get Full-Text PDF Get Analytical Report

Abstract

A document’s keywords provide high-level descriptions of the content that summarize the document’s central themes, concepts, ideas, or arguments. These descriptive phrases make it easier for algorithms to find relevant information quickly and efficiently. It plays a vital role in document processing, such as indexing, classification, clustering, and summarization. Traditional keyword extraction approaches rely on statistical distributions of key terms in a document for the most part. According to contemporary technological breakthroughs, contextual information is critical in deciding the semantics of the work at hand. Similarly, context-based features may be beneficial in the job of keyword extraction. For example, simply indicating the previous or next word of the phrase of interest might be used to describe the context of a phrase. This research presents several experiments to validate that context-based key extraction is significant compared to traditional methods. Additionally, the KeyBERT proposed methodology also results in improved results. The proposed work relies on identifying a group of important words or phrases from the document’s content that can reflect the authors’ main ideas, concepts, or arguments. It also uses contextual word embedding to extract keywords. Finally, the findings are compared to those obtained using older approaches such as Text Rank, Rake, Gensim, Yake, and TF-IDF. The Journals of Universal Computer (JUCS) dataset was employed in our research. Only data from abstracts were used to produce keywords for the research article, and the KeyBERT model outperformed traditional approaches in producing similar keywords to the authors’ provided keywords. The average similarity of our approach with author-assigned keywords is 51%.

Keywords:

Computer science Keyword extraction Phrase Automatic summarization Information retrieval Natural language processing Word (group theory) Word embedding Context (archaeology) Cluster analysis tf–idf Rank (graph theory) Search engine indexing Artificial intelligence Semantics (computer science) Similarity (geometry) Term (time) Embedding Linguistics

Metrics

Cited By

10.57

FWCI (Field Weighted Citation Impact)

Refs

0.98

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Text Analysis Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Information Retrieval and Search Behavior

Physical Sciences → Computer Science → Information Systems

Impact analysis of keyword extraction using contextual word embedding

Abstract

Metrics

Citation History

Topics

Related Documents