A high accuracy OCR system for printed Telugu text

C. Vasantha Lakshmi; C. Patvardhan

doi:10.1109/tencon.2003.1273274

ScienceGate Book Chapters

JOURNAL ARTICLE

A high accuracy OCR system for printed Telugu text

C. Vasantha Lakshmi C. Patvardhan

Year: 2004 Pages: 725-729

DOI: 10.1109/tencon.2003.1273274

Get Full-Text PDF Get Analytical Report

Abstract

Telugu is one of the oldest and most popular languages of India. The paper describes the design and development of a Telugu optical character recognition system for printed text (TOSP). Preprocessing tasks considered are: conversion of a grey scale image to a binary image; image rectification; skew detection and removal; segmentation of text into lines, words and basic symbols. Basic symbols are identified as the fundamental unit of segmentation and are recognized by neural recognizers. The recognizers are aided by an improvement module that uses additional logic to recognize confusing symbols correctly, resulting in increased recognition accuracy. The combinations of these basic symbols that together form characters and compound characters of Telugu are also determined to complete the recognition process. The special feature of TOSP is that it is designed to handle multiple sizes and multiple fonts. Further, the output produced by TOSP can be opened directly in any Indian language software that supports the facility for transliteration into Telugu script, and then edited. Several such software are popular and available.

Keywords:

Telugu Computer science Optical character recognition Artificial intelligence Feature (linguistics) Software Preprocessor Speech recognition Natural language processing Skew Segmentation Pattern recognition (psychology) Image (mathematics)

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.19

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Handwritten Text Recognition Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Retrieval and Classification Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Vehicle License Plate Recognition

Physical Sciences → Engineering → Media Technology

A high accuracy OCR system for printed Telugu text

Abstract

Metrics

Citation History

Topics

Related Documents

An optical character recognition system for printed Telugu text

OCR of Printed Telugu Text with High Recognition Accuracies

A multi-font OCR system for printed Telugu text

Multilingual Translational Optical Character Recognition System for Printed Telugu Text

Fringe Map Based Text Line Segmentation of Printed Telugu Document Images