JOURNAL ARTICLE

Unsupervised Author Disambiguation using Heterogeneous Graph Convolutional Network Embedding

Abstract

People share same names in real world. When a digital library user searches for an author name, he may see a mixture of publications by different authors who have the same name. Making distinctions between them is an important prerequisite to improve the quality of services and contents in digital libraries. The general task of author disambiguation is to associate publications which belong to an identical name or names with highly similar spellings to different people entities. In recent years, many researches have been conducted to solve this challenging task. However, some works rely heavily on external knowledge bases and manually annotated data. Some unsupervised learning based works require complex feature engineering. In this paper, we propose a novel and efficient author disambiguation framework which needs no labeled data. We first construct a publication heterogeneous network for each ambiguous name. Then, we use our proposed heterogeneous graph convolutional network embedding method that encodes both graph structure and node attribute information to learn publication representations. After that, we propose a graph enhanced clustering method for name disambiguation that can greatly accelerate the clustering process and need not require the number of distinct persons. Our framework can be continually retrained and applied on incremental disambiguation task when new publications are put in. Experimental results on two datasets show that our framework clearly performs better than several state-of-the-art methods for author disambiguation.

Keywords:
Computer science Cluster analysis Graph Feature engineering Information retrieval Entity linking Task (project management) Construct (python library) Artificial intelligence Embedding Natural language processing Word embedding Graph embedding Deep learning Theoretical computer science Knowledge base

Metrics

32
Cited By
2.93
FWCI (Field Weighted Citation Impact)
53
Refs
0.91
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Data Quality and Management
Social Sciences →  Decision Sciences →  Management Science and Operations Research
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Biomedical Text Mining and Ontologies
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.