JOURNAL ARTICLE

CERMINE: automatic extraction of structured metadata from scientific literature

Dominika TkaczykPaweł SzostekMateusz FedoryszakPiotr Jan DendekŁukasz Bolikowski

Year: 2015 Journal:   International Journal on Document Analysis and Recognition (IJDAR) Vol: 18 (4)Pages: 317-335   Publisher: Springer Science+Business Media

Abstract

CERMINE is a comprehensive open-source system for extracting structured metadata from scientific articles in a born-digital form. The system is based on a modular workflow, whose loosely coupled architecture allows for individual component evaluation and adjustment, enables effortless improvements and replacements of independent parts of the algorithm and facilitates future architecture expanding. The implementations of most steps are based on supervised and unsupervised machine learning techniques, which simplifies the procedure of adapting the system to new document layouts and styles. The evaluation of the extraction workflow carried out with the use of a large dataset showed good performance for most metadata types, with the average F score of 77.5 %. CERMINE system is available under an open-source licence and can be accessed at http://cermine.ceon.pl . In this paper, we outline the overall workflow architecture and provide details about individual steps implementations. We also thoroughly compare CERMINE to similar solutions, describe evaluation methodology and finally report its results.

Keywords:
Metadata Workflow Computer science Implementation Modular design Architecture Information retrieval Component (thermodynamics) Software engineering Workflow engine Data mining Database World Wide Web Programming language

Metrics

179
Cited By
19.80
FWCI (Field Weighted Citation Impact)
31
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Scientific Computing and Data Management
Social Sciences →  Decision Sciences →  Information Systems and Management
Semantic Web and Ontologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Research Data Management Practices
Physical Sciences →  Computer Science →  Information Systems

Related Documents

JOURNAL ARTICLE

Epistemic logic and CERMINE: a logical model for automatic extraction of structured metadata

Simone Cuconato

Journal:   DOAJ (DOAJ: Directory of Open Access Journals) Year: 2021 Vol: 9 (1)Pages: 161-172
JOURNAL ARTICLE

AUTOMATIC METADATA EXTRACTION FROM SCIENTIFIC PDF DOCUMENTS

Journal:   Informatics and Applications Year: 2018
JOURNAL ARTICLE

Automatic Literature Metadata Extraction from DataCite Services

Kun Ma

Journal:   Recent Patents on Computer Science Year: 2018 Vol: 11 (1)Pages: 25-31
© 2026 ScienceGate Book Chapters — All rights reserved.