JOURNAL ARTICLE

Automatic metadata information extraction from scientific literature using deep neural networks

Abstract

We present a novel computer vision-based deep learning approach for metadata extraction as both a central component of and an ancillary aid to structured information extraction from scientific literature which has various formats. The number of scientific publications is growing rapidly, but existing methods cannot combine the techniques of layout extraction and text recognition efficiently because of the various formats used by scientific literature publishers. In this paper, we introduce an end-to-end trainable neural network for segmenting and labeling the main regions of scientific documents, while simultaneously recognizing text from the detected regions. The proposed framework combines object detection techniques based on Recurrent Convolutional Neural Network (RCNN) for scientific document layout detection with Convolutional Recurrent Neural Network (CRNN) for text recognition. We also contribute a novel data set of main region annotations for scientific literature metadata information extraction to complement the limited availability of high-quality data set. The final outputs of the network are the text content (payload) and the corresponding labels of the major regions. Our results show that our model outperforms state-of-the-field baselines.

Keywords:
Computer science Metadata Convolutional neural network Information extraction Information retrieval Set (abstract data type) Artificial neural network Artificial intelligence Deep learning Recurrent neural network Feature extraction Data set Complement (music) Field (mathematics) World Wide Web

Metrics

5
Cited By
0.62
FWCI (Field Weighted Citation Impact)
35
Refs
0.62
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Handwritten Text Recognition Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Processing and 3D Reconstruction
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Retrieval and Classification Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

CERMINE: automatic extraction of structured metadata from scientific literature

Dominika TkaczykPaweł SzostekMateusz FedoryszakPiotr Jan DendekŁukasz Bolikowski

Journal:   International Journal on Document Analysis and Recognition (IJDAR) Year: 2015 Vol: 18 (4)Pages: 317-335
JOURNAL ARTICLE

Information Extraction from Scientific Literature Using Automatic Pattern Expansion

Chang-Hoo JeongHong-Woo ChunTaehong KimJung-Ho UmSa-Kwang SongHanmin JungSung-Pil Choi

Journal:   INTERNATIONAL JOURNAL ON Advances in Information Sciences and Service Sciences Year: 2013 Vol: 5 (10)Pages: 215-222
BOOK-CHAPTER

Deep Neural Networks for Automated Metadata Extraction

Abdellah El OmariJilali AntariHamza Elkina

Lecture notes in networks and systems Year: 2024 Pages: 64-74
JOURNAL ARTICLE

AUTOMATIC METADATA EXTRACTION FROM SCIENTIFIC PDF DOCUMENTS

Journal:   Informatics and Applications Year: 2018
© 2026 ScienceGate Book Chapters — All rights reserved.