A System for Text Extraction in Complex-Background Document Images

Thành Tâm Nguyên; Hoang Cong-Nhat-Nam; Thanh-Sach Le; Tuan Anh Tran

doi:10.1109/acomp.2019.00017

ScienceGate Book Chapters

JOURNAL ARTICLE

A System for Text Extraction in Complex-Background Document Images

Thành Tâm Nguyên Hoang Cong-Nhat-Nam Thanh-Sach Le Tuan Anh Tran

Year: 2019 Pages: 65-69

DOI: 10.1109/acomp.2019.00017

Get Full-Text PDF Get Analytical Report

Abstract

Due to the demand of information transportation, identification, archive, the digitization of document images is increasingly concerned. Detecting text regions is the first and crucial step in End-to-End text recognition system. With the complex background document images, they are still a challenging problem due to the variety of fonts, sizes, colors of the text, and background complexity. This paper presents a system based on a Connectionist Text Proposal Network (CTPN) for extracting text regions in the document image with a complex background. This method consists of two fundamental stages: detect fine-scale text and text line extraction based on the obtained text components. We tried many-core of the feature extracting method such as VGG19, Resnet50 as well as evaluate the system's performance on many different datasets such as ICDAR2011, ICDAR2013, and a private real book cover. Besides, we also built an online visualize evaluation system to compare the results.

Keywords:

Computer science Digitization Artificial intelligence Document layout analysis Feature extraction Information retrieval Optical character recognition Variety (cybernetics) Identification (biology) Text detection Feature (linguistics) Natural language processing Image (mathematics) Computer vision

Metrics

Cited By

0.11

FWCI (Field Weighted Citation Impact)

Refs

0.49

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Handwritten Text Recognition Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Processing and 3D Reconstruction

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Retrieval and Classification Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

A System for Text Extraction in Complex-Background Document Images

Abstract

Metrics

Citation History

Topics

Related Documents

Text Extraction from Complex Background Images

Extraction of Text Regions from Complex Background in Document Images by Multilevel Clustering

Text Extraction from Mail Images with Complex Background

Accurate extraction of handwritten text line in complex document images

Text Extraction in Complex Color Document Images for Enhanced Readability