JOURNAL ARTICLE

Converting printed Sinhala documents to formatted editable text

Abstract

Digitizing printed document is always a challenge faced by the computing society. Digitization of text not only allows users to easily modify and reprint printed documents, but also is a need of the day due to the use of word-search capability available at disposal in this era. Converting a printed document into a stream of characters using OCR (optical character recognition) techniques is a widely researched area of the past and there are a number of well established algorithms available in the literature to do so. However, the idea of preserving the formatting information of the original document is not much studied. The contribution of this paper is of two folds: (1) applying known OCR techniques to one of Sri Lanka's native languages, Sinhala, and addressing the challenges in doing so and (2) maintaining a number of selected formatting features of the printed document in the converted editable text. Therefore, this paper outlines the design and implementation of a software system that converts a scanned paper document written in Sinhala language into formatted editable text and describes how this system is integrated into an open-source word processing tool.

Keywords:
Disk formatting Digitization Optical character recognition Computer science Word (group theory) Software Document processing Word processing Information retrieval World Wide Web Natural language processing Artificial intelligence Image (mathematics) Programming language Linguistics Operating system

Metrics

13
Cited By
0.32
FWCI (Field Weighted Citation Impact)
3
Refs
0.58
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Handwritten Text Recognition Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Processing and 3D Reconstruction
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Retrieval and Classification Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.