Gang WangHua ping ZhangJian yun Shang
Scene Text Recognition remains a challenging problem because of various text styles and image distortions. This paper proposed an end-to-end trainable model with a rectification module network.The rectification module adopts a polynomial based spatial transform network to rectify the distorted input image, the feature representation between the rectification and encoding step is shared. The model can be trained with the scene text images and the corresponding word labels. With the flexible rectifying and feature sharing, this model outperforms previous works through the extensive evaluation results on the standard benchmarks, especially on irregular datasets, 80.2% on IC15 and 85.4% on CUTE, more specifically.
Mingkun YangYushuo GuanMinghui LiaoXin HeKaigui BianSong BaiCong YaoXiang Bai
Baoguang ShiXinggang WangPengyuan LyuCong YaoXiang Bai
Wenjun KeJianguo WeiQingzhi HouHui Feng
Zhaowei CaiEnqi ZhanSui LeiYu WangJian Zhou
Veronica NaosekpamAilneni Sai ShishirNilkanta Sahu