JOURNAL ARTICLE

Special section on mining knowledge from scientific data

Abstract

The past two decades have witnessed the rapid growth of scientific publications in all areas of research. Easier access to published literature (open access, arxiv preprints, etc.) along with the recent development in computational methods, has provided researchers with a productive platform to study vast amounts of scholarly data. Scholarly data mining has thus made it possible to do “research about research!” It plays a vital role in scientometrics, bibliometrics, webometrics, and altmetrics, which require applying sophisticated algorithms to curate and derive useful insights from scholarly data. Moreover, the knowledge extracted from the scientific data can help in several decision-making processes such as policy making for fund disbursement, identifying research gap in a department and recruiting faculties to fill up the gap, speculating upcoming research areas, etc. On the other hand, the increasing popularity and use of these metrics as a measure of the quality of research output, for determining university rankings, and in decision making (tenure and recruitment decisions), has also given rise to objectionable practices to artificially boost these measures (self-citations, citation-cliques, etc.). Given that, is it always right to consider these metrics as a reliable proxy of research quality? How should decision and policymakers use these metrics to account for such malpractices? This special section aims to bring together the latest groundbreaking research on issues related to knowledge extraction and deriving insights from scientific data. Of special interest is the role these metrics play in policy and decision making – both positives and negatives. We welcomed both theoretical and empirical research, and case studies that lead to the development of novel algorithms, tool, techniques, metrics, decisions and measurements related to scholarly data. A total of 12 papers were submitted to this special section. All the submitted papers underwent a rigorous review process. Each paper was reviewed by at least two reviewers. Every paper went through at least two rounds of revisions. The contributions of the accepted papers are briefly summarized below. Outlier detection in data mining is a major research agenda. However, when it comes to scientometric research, such outliers indicate malpractices in scientific research, resulting in issues such as citation cartels and citation stacking. Chakraborty et al. (2020) defined a diverse feature set that can identify such cases of extreme outliers and reasoned them. They also showed the effect of such outlier behaviour on the bibliographic metrics such as h-index and impact factor. Madisetty et al. (2020) proposed a tool to extract inline mathematical expressions from scientific articles. This is a major problem in scientific document processing as mathematic systems often act as a bottleneck due to their cryptic symbols that a parse is unable to extract. The authors proposed two models - the first one uses conditional random field using hand-crafted features and the second one uses bidirectional LSTM. This work contributes to building a real-world tool or can act as a plugin of a scientific document parser. Document classification is always an important problem in text processing. In scientific data mining, document classification is needed to categorize scientific papers into topics, keywords, etc. Masmoudi et al. (2020) proposed a novel hierarchical document classification approach using limited labelled data. They utilized the co-training paradigm to exploit content and bibliographic coupling information as two distinct papers' views. A massive unlabelled data was utilized during co-training. We hope that this special section will provide insights, analysis, and understanding about “scientific research” and help in building the foundation for future research and development in scientometrics. We sincerely thank the Editor-in-Chief of this journal, Dr Jon G. Hall for accepting our request to organize the special section with the Expert Systems journal. We would also like to thank the entire editorial team of the journal, the authors who submitted their valuable research to this special section and the reviews for their support for timely evaluations and comments.

Keywords:
Computer science Data science Popularity Scientometrics Bibliometrics Citation Quality (philosophy) Cyberinfrastructure Ingenuity Altmetrics Knowledge extraction Data mining World Wide Web Political science

Metrics

1
Cited By
0.14
FWCI (Field Weighted Citation Impact)
3
Refs
0.53
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Data-Driven Disease Surveillance
Health Sciences →  Medicine →  Epidemiology

Related Documents

JOURNAL ARTICLE

Special Section: Data Mining

H. Michael ChungPaul Gray

Journal:   Journal of Management Information Systems Year: 1999 Vol: 16 (1)Pages: 11-16
JOURNAL ARTICLE

Special section: Data mining in grid computing environments

Vlado StankovskiWerner Dubitzky

Journal:   Future Generation Computer Systems Year: 2006 Vol: 23 (1)Pages: 31-33
BOOK-CHAPTER

Data Mining: Knowledge from Data

Patrick Bangert

Year: 2011 Pages: 67-119
© 2026 ScienceGate Book Chapters — All rights reserved.