Proposes a handwritten character string recognition method for Japanese mail address reading on very large vocabulary. The recognition is performed by classification-embedded lexicon matching based on over-segmentation. The lexicon contains 111,349 address phrases and is represented in a trie structure. In recognition, the input text line image is matched with all lexicon entries by beam search to obtain reliable character segmentation and retrieve valid phrases. A classifier is embedded in lexicon matching to select from a dynamic set the characters matched with a candidate pattern. The beam search and the character classification jointly enable accurate phrase identification in real time. In experiments on 3,589 live mail images, the proposed method achieved correct rate of 83.68% with error rate less than 1%.
Cheng-Lin LiuMasashi KogaHiromichi Fujisawa
Xiaoyan ZhouJinlun YuChangfu LiuTakeshi NagasakiKatsumi Marukawa
Qiang FuXiaoqing DingYan Jiang