OCR, an acronym for "Optical Character Recognition" is a system that automatically grabs information one needs from scanned images of typewritten or printed text by translating them into machine-encoded text. OCR today is embedded in many applications, websites, etc., but most of these systems operate for Latin-based scripts such as Roman and English. India is a multilingual country with more than 19,500 languages or dialects spoken as mother tongues. Due to this diversity, many works are not reported in Indian languages. Most of the Indian language has large character sets that are complex in structure compared to Latin-based scripts. Transfer learning of Latin-based OCR systems to Telugu is hence a difficult undertaking. Neural networks are best equipped to meet the difficulty of Telugu OCR. This work aims to develop a multilingual translation OCR system that can recognize the basic printed texts of Telugu scripts.
C. Vasantha LakshmiC. Patvardhan
B. RevathiB. N. V. Narasimha RajuKaki SurendranathK. MounikaG Sravani
Sunkara Nagendra KumarB. RevathiP. N. K. A. V. V. S. VidyaSk. Chowsar Neha
Safwa TahaYusra BabikerMohamed Abbas
P. MalathiChandrakanth G. Pujari