Symbolic document image compression based on pattern matching techniques

Chwan-Yi Shiah; Yun‐Sheng Yen

doi:10.1117/12.913413

ScienceGate Book Chapters

JOURNAL ARTICLE

Symbolic document image compression based on pattern matching techniques

Chwan-Yi Shiah Yun‐Sheng Yen

Year: 2011 Journal: Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE Vol: 8285 Pages: 82851D-82851D Publisher: SPIE

DOI: 10.1117/12.913413

Get Full-Text PDF Get Analytical Report

Abstract

In this paper, a novel compression algorithm for Chinese document images is proposed. Initially, documents are segmented into readable components such as characters and punctuation marks. Similar patterns within the text are found by shape context matching and grouped to form a set of prototype symbols. Text redundancies can be removed by replacing repeated symbols by their corresponding prototype symbols. To keep the compression visually lossless, we use a multi-stage symbol clustering procedure to group similar symbols and to ensure that there is no visible error in the decompressed image. In the encoding phase, the resulting data streams are encoded by adaptive arithmetic coding. Our results show that the average compression ratio is better than the international standard JBIG2 and the compressed form of a document image is suitable for a content-based keyword searching operation.

Keywords:

Computer science Lossless compression Data compression Image compression Artificial intelligence Pattern recognition (psychology) Cluster analysis Compression ratio Coding (social sciences) Arithmetic coding Huffman coding Pattern matching Symbol (formal) Encoding (memory) Computer vision Image (mathematics) Context-adaptive binary arithmetic coding Image processing Mathematics

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.11

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Algorithms and Data Compression

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Data Compression Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Retrieval and Classification Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Symbolic document image compression based on pattern matching techniques

Abstract

Metrics

Topics

Related Documents

Entropy-based pattern matching for document image compression

Document image compression via pattern matching.

Content-lossless document image compression based on structural analysis and pattern matching

Pattern matching image compression

Compression of Chinese document images based on morphologic analysis and pattern matching