JOURNAL ARTICLE

Extraction of Text Regions from Complex Background in Document Images by Multilevel Clustering

Hoai Nam VuTuan Anh TranIn Seop NaSoo Hyung Kim

Year: 2016 Journal:   ˜The œInternational journal of networked and distributed computing Vol: 4 (1)Pages: 11-11   Publisher: Springer Nature

Abstract

Textual data plays an important role in a number of applications such as image database indexing, document understanding, and image-based web searching. The target of automatic real-life text extracting in document images without character recognition module is to identify image regions that contain only text. These textual regions can then be either input of optical character recognition application or highlighted for user focusing. In this paper we propose a method which consists of three stages-preprocessing which improves contrast of grayscale image, multi-level thresholding for separating textual region from non-textual object such as graphics, pictures, and complex background, and heuristic filter, recursive filter for text localizing in textual region. In many of these applications, it is not necessary to identify all the text regions, therefore we emphasize on identifying important text region with relatively large size and high contrast. Experimental results on real-life dataset images demonstrate that the proposed method is effective in identifying textual region with various illuminations, size and font from various types of background.

Keywords:
Computer science Cluster analysis Hierarchical clustering Extraction (chemistry) Artificial intelligence Document clustering Data mining Information retrieval Pattern recognition (psychology)

Metrics

3
Cited By
0.17
FWCI (Field Weighted Citation Impact)
23
Refs
0.56
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Handwritten Text Recognition Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Processing and 3D Reconstruction
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Retrieval and Classification Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.