In this paper, we take a pattern recognition approach to correcting errors in text generated from printed documents using optical character recognition (OCR). We apply a very general, theoretically optimal model to the problem of OCR word correction, introduce practical methods for parameter estimation, and evaluate performance on real data.
Simon FlachsOphélie LacroixAnders Søgaard
Mikael LassenAdriano BerniLars S. MadsenRadim FilipUlrik L. Andersen
Mohammad Hoseyn SheykholeslamBehrouz Minaei‐BidgoliHossein Juzi