This paper focuses on the rectification of camera-captured document images with varied layouts of mixed contents. Document images acquired via cameras, including smartphones, are typically plagued by perspective, geometric, and/or rotational distortion that hinders document analysis processes. In this paper, we propose an approach to camera-captured image rectification of text and non-text regions that handles perspective, geometric and rotational distortions present in planar and curled documents, extending a state-of-the-art content-based rectification method. We define surface projections via a three-tiered local transformation model, in which primary curved surface projections are formed from individual text regions, and secondary and tertiary surface projections are formed from non-text regions, resulting in a 'patchwork' combination of surfaces spanning the document image. This transformation model allows us to process document images with varied layouts of mixed contents, including large images and graphics, that also contain some justified text. Experiments and comparisons with a state-of-the-art content-based rectification approach on the public IUPR dataset demonstrate the value of the proposed approach on two levels: 1) a significantly improved rectification performance using standard optical character recognition metrics, along with increased document readability, and 2) an improved range of applicability, i.e. ability to correct document images showing various layouts and content types.
Jian LiangDaniel DeMenthonDavid Doermann
Debanshu BanerjeePratik BhowalSuman Kumar BeraRam Sarkar