JOURNAL ARTICLE

Ottoman OCR: Printed Naskh Font

İshak DÖLEKAtakan Kurt

Year: 2021 Journal:   2021 International Conference on INnovations in Intelligent SysTems and Applications (INISTA) Pages: 1-5

Abstract

We present an OCR tool developed for printed Ottoman documents in naksh font as part of a project named End-to-End Conversion of Ottoman Documents to Modern Turkish This tool uses a deep learning model trained with a data set containing original and synthetic documents. We conducted an experimental comparison of this tool named Osmanlica.com with Tesseract Arabic, Tesseract Persian, Abby Finereader, Miletos and Google Docs OCR tools (or models) using a test data set comprised of 21 pages of original documents. With 88,64% raw, 95,92% normalized and 97,18% joined character recognition accuracy rates, Osmanlica.com outperformed the other tools with a marked difference. Osmanlica.com also achieved 58% word recognition accuracy which is the only rate over 50% among the OCR tools compared. We shared the test data set, ground truth, OCR outputs and the test program written in Python using difflib at osmanlica.com/test for independent verification.

Keywords:
Font Python (programming language) Computer science Optical character recognition Artificial intelligence Natural language processing Turkish Test set Arabic Ground truth Persian Training set Test data Information retrieval Speech recognition Programming language Linguistics

Metrics

6
Cited By
0.19
FWCI (Field Weighted Citation Impact)
20
Refs
0.57
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Handwritten Text Recognition Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Processing and 3D Reconstruction
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.