JOURNAL ARTICLE

Boosting text extraction from biomedical images using text region detection

Abstract

In this paper, we show that domain-optimized text detection in biomedical images is important for boosting text extraction recall via off-the-shelf OCR engines. Methodologically, we contrast OCR performance when processing raw biomedical images, compared to preprocessing those images, and performing OCR on detected image text regions only. To quantify OCR extraction results, we rely on a gold standard image text corpus with manually identified image text strings. To demonstrate the positive effect on biomedical image retrieval, we apply image text detection and extraction to a large corpus of biomedical images in the Yale Image Finder system. We show that improved text extraction results in the retrieval of a larger number of relevant images for a set of domain-relevant keyword searches.

Keywords:
Computer science Boosting (machine learning) Preprocessor Artificial intelligence Text detection Pattern recognition (psychology) Image retrieval Precision and recall Feature extraction Image (mathematics) Optical character recognition

Metrics

9
Cited By
0.77
FWCI (Field Weighted Citation Impact)
17
Refs
0.75
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Handwritten Text Recognition Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Retrieval and Classification Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Biomedical Text Mining and Ontologies
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology

Related Documents

JOURNAL ARTICLE

Text Region Extraction From Scene Images Using AGF and MSER

Rituraj SoniBijendra KumarSatish Chand

Journal:   International Journal of Image and Graphics Year: 2020 Vol: 20 (02)Pages: 2050009-2050009
JOURNAL ARTICLE

Text Extraction from Images Using OCR

K Tejaswini Jyothi E

Journal:   International Journal for Research in Applied Science and Engineering Technology Year: 2020 Vol: 8 (5)Pages: 1805-1810
BOOK-CHAPTER

Text Region Extraction from Quality Degraded Document Images

S. AbiramiD. Manjula

Lecture notes in computer science Year: 2007 Pages: 519-527
JOURNAL ARTICLE

Information extraction from biomedical text

Jerry R. Hobbs

Journal:   Journal of Biomedical Informatics Year: 2002 Vol: 35 (4)Pages: 260-264
© 2026 ScienceGate Book Chapters — All rights reserved.