JOURNAL ARTICLE

CERMINE -- Automatic Extraction of Metadata and References from Scientific Literature

Abstract

CERMINE is a comprehensive open source system for extracting metadata and parsed bibliographic references from scientific articles in born-digital form. The system is based on a modular workflow, whose architecture allows for single step training and evaluation, enables effortless modifications and replacements of individual components and simplifies further architecture expanding. The implementations of most steps are based on supervised and unsupervised machine-learning techniques, which simplifies the process of adjusting the system to new document layouts. The paper describes the overall workflow architecture, provides details about individual implementations and reports evaluation methodology and results. CERMINE service is available at http://cermine.ceon.pl.

Keywords:
Computer science Metadata Workflow Implementation Modular design Process (computing) Architecture Software engineering Information retrieval Parsing Service (business) World Wide Web Database Artificial intelligence Programming language

Metrics

35
Cited By
7.24
FWCI (Field Weighted Citation Impact)
16
Refs
0.97
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Semantic Web and Ontologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Web Data Mining and Analysis
Physical Sciences →  Computer Science →  Information Systems
Biomedical Text Mining and Ontologies
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology

Related Documents

JOURNAL ARTICLE

CERMINE: automatic extraction of structured metadata from scientific literature

Dominika TkaczykPaweł SzostekMateusz FedoryszakPiotr Jan DendekŁukasz Bolikowski

Journal:   International Journal on Document Analysis and Recognition (IJDAR) Year: 2015 Vol: 18 (4)Pages: 317-335
JOURNAL ARTICLE

AUTOMATIC METADATA EXTRACTION FROM SCIENTIFIC PDF DOCUMENTS

Journal:   Informatics and Applications Year: 2018
JOURNAL ARTICLE

Automatic Literature Metadata Extraction from DataCite Services

Kun Ma

Journal:   Recent Patents on Computer Science Year: 2018 Vol: 11 (1)Pages: 25-31
JOURNAL ARTICLE

Epistemic logic and CERMINE: a logical model for automatic extraction of structured metadata

Simone Cuconato

Journal:   DOAJ (DOAJ: Directory of Open Access Journals) Year: 2021 Vol: 9 (1)Pages: 161-172
© 2026 ScienceGate Book Chapters — All rights reserved.