The technique of turning images of printed or written text from scanned documents, images of documents, or simple photos into machine-encoded text is known as optical character recognition (OCR). OCR has proven to be very useful in terms of digitizing documents and making them easier to analyze. Despite the advancement in the technology since it was introduced, there are still areas OCR falls short. If either the written text is illegible, or the OCR software isn't powerful enough, it results in inaccurate translations. This research work aims at addressing this shortcoming by performing post-processing on OCR outputs primarily using Transformers such as BERT in a two-step pipeline to correct these mistakes and improve the quality of the document.
Srinivas Kumar PalvadiKrishna Prasad K