Scene text text recognition is the recognition of text instances in natural scenes, although STR methods have made great progress, but arbitrary shapes, severely bent and rotated texts are still difficult to recognize. To solve the problem that irregular texts in complex natural scenes is difficult to identify, this paper proposes a scene text recognition method based on encoder-decoder framework, which combines self-attention mechanism to form a text recognition network in transformer encoder-decoder framework. The model improves the recognition rate of low-quality images by using super-resolution unit, describes the two-dimensional spatial correlation of characters in scene text images by self-attention mechanism, and can recognize random irregular texts. The average performance of the proposed model in the "irregular texts" benchmark is better than that of most existing STR models, and the average recognition accuracy is more than 84.3%.
Subhashini PenetiN. Thulasi Chitra
S. PrabuK. Joseph Abraham Sundar
Ling-Qun ZuoHong-Mei SunQi-Chao MaoRong QiRui‐Sheng Jia
Meiling LiXiumei LiJunmei SunYujin Dong
Zhi QiaoYu ZhouDongbao YangYucan ZhouWeiping Wang