YongWei WangShuangshuang XuYao XiaoYiGuang YangHao LiRongZheng Yang
Proposed a method combined with semantic segmentation for document image rectification to address the limited applicability of traditional correction methods and difficulties in data annotation. Firstly, diverse experimental data are synthesized to create a target document image and its corresponding mask data. Secondly, a semantic segmentation model based on DeepLabV3 is constructed using pre-trained MobileNetV3, ResNet50, and ResNet101 as the backbone networks, respectively, to separate the document page area(ROI) from the image. Then, the document page area is corrected using corner point detection and perspective transformation to complete the document image rectification. The evaluation of the model shows that the DeepLabV3 model with MobileNetV3 as the backbone network has high processing efficiency, with an IOU of 0.983 and a Total Loss of 0.064 on the validation set. Test results demonstrate that the proposed method has better generalization capabilities and can be easily extended to practical engineering.
Dhanya M. DhanalakshmyHema P Menon