Zhiqiang TianChunhui WangYouzi XiaoYuping Lin
Summary Scene text recognition (STR) is a very popular topic in the field of computer vision, which can extract text from complex natural scenes. In this article, we propose an end‐to‐end trainable and flexible STR method based on a dual attention mechanism. The proposed method consists of four modules: a thin plate spline transformer for normalizing the original image, a Channel‐Att feature extractor for obtaining representative features, a bidirectional long short‐term memory encoder for encoding sequential context features, and a Self‐Att based decoder for predicting text labels. The results on seven different benchmark datasets IIIT, SVT, IC03, IC13, IC15, SVTP, and CUTE, show that the proposed method is comparable to 13 existing methods. Especially, the average text recognition accuracy of the proposed method is about 1.4% higher than the state‐of‐the‐art method.
Hui WangTao HuXiaowei GengKai Li
Zheng XiaoZhenyu NieChao SongAnthony T. Chronopoulos
Xiang ShuaiXiao WangWei WangXin YuanXin Xu
Xinjian GaoYe PangYuyu LiuJun YuMaokun HanKai HouWei Wang
Haodong YangShuo LiXiaoqing YinAnqi HanJun Zhang