Automatic metadata information extraction from scientific literature using deep neural networks

Huichen Yang; William Hsu

doi:10.1117/12.2623554

ScienceGate Book Chapters

JOURNAL ARTICLE

Automatic metadata information extraction from scientific literature using deep neural networks

Huichen Yang William Hsu

Year: 2022 Pages: 44-44

DOI: 10.1117/12.2623554

Get Full-Text PDF Get Analytical Report

Abstract

We present a novel computer vision-based deep learning approach for metadata extraction as both a central component of and an ancillary aid to structured information extraction from scientific literature which has various formats. The number of scientific publications is growing rapidly, but existing methods cannot combine the techniques of layout extraction and text recognition efficiently because of the various formats used by scientific literature publishers. In this paper, we introduce an end-to-end trainable neural network for segmenting and labeling the main regions of scientific documents, while simultaneously recognizing text from the detected regions. The proposed framework combines object detection techniques based on Recurrent Convolutional Neural Network (RCNN) for scientific document layout detection with Convolutional Recurrent Neural Network (CRNN) for text recognition. We also contribute a novel data set of main region annotations for scientific literature metadata information extraction to complement the limited availability of high-quality data set. The final outputs of the network are the text content (payload) and the corresponding labels of the major regions. Our results show that our model outperforms state-of-the-field baselines.

Keywords:

Computer science Metadata Convolutional neural network Information extraction Information retrieval Set (abstract data type) Artificial neural network Artificial intelligence Deep learning Recurrent neural network Feature extraction Data set Complement (music) Field (mathematics) World Wide Web

Metrics

Cited By

0.62

FWCI (Field Weighted Citation Impact)

Refs

0.62

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Handwritten Text Recognition Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Processing and 3D Reconstruction

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Retrieval and Classification Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Automatic metadata information extraction from scientific literature using deep neural networks

Abstract

Metrics

Citation History

Topics

Related Documents

CERMINE: automatic extraction of structured metadata from scientific literature

Information Extraction from Scientific Literature Using Automatic Pattern Expansion

CERMINE -- Automatic Extraction of Metadata and References from Scientific Literature

Deep Neural Networks for Automated Metadata Extraction

AUTOMATIC METADATA EXTRACTION FROM SCIENTIFIC PDF DOCUMENTS