Dr. K. Siva KumarChinnam LavanyaCheedella Sai PranaviAddagatla Sagari Sailaja Kumari
Recognizing scene text under irregular distortions demands robust rectification prior to decoding. We propose aTwo-Level Rectification Attention Network (TRAN) that unites a Geometry-Level Rectification Network(GEO)—leveraging thin-plate spline (TPS) warping to correct global skew and curvature—with a Pixel-LevelRectification Network (PIX) that applies fine-grained per-pixel offsets to refine local deformations. To handlediverse character scales and appearances, we introduce a Channel-Kernel Attention Unit that dynamicallyweighs feature channels and convolutional kernels. Implemented atop the ClovaAI deep-text-recognitionbenchmark framework with PyTorch and pretrained CNN–RNN backbones, TRAN demonstrates superiorrectification and recognition performance. Large-scale experiments on benchmarks with curved, rotated, andperspective-warped text demonstrate that TRAN's two-stage rectification strategy is far superior to single-stagerectification algorithms. Our results point to the potential of combining multi-level rectification with adaptiveattention as a promising direction for more robust scene text recognition in real-world applications likenavigation systems and reading aid devices.
Dr. K. Siva KumarChinnam LavanyaCheedella Sai PranaviAddagatla Sagari Sailaja Kumari
Baoguang ShiXinggang WangPengyuan LyuCong YaoXiang Bai
Chengyu GuShilin WangYiwei ZhuZheng HuangKai Chen
Lintai WuYong XuJunhui HouC. L. Philip ChenCheng‐Lin Liu
Wenjun KeJianguo WeiQingzhi HouHui Feng