JOURNAL ARTICLE

Entropy-Based Selection of Cluster Representatives for Document Image Compression

Luis F. Mun͂oz-PérezJosé Antonio Esquivel GuerreroJorge E. Macías‐Díaz

Year: 2019 Journal:   SIAM Journal on Imaging Sciences Vol: 12 (4)Pages: 1720-1738   Publisher: Society for Industrial and Applied Mathematics

Abstract

In this work, we introduce an efficient method for lossy compression of digitalized documents. The method uses a dictionary which consists of class representatives defined using a minimum entropy criterion. The algorithm initially identifies the different symbols contained in a document image, and then the symbols are grouped in classes by means of a hierarchic clustering algorithm. For each class, a representative is selected using the principle of minimum entropy and suitable similarity distances. The technique creates a file in which every object belonging to a class is replaced by its class representative. Finally, the resulting file is compressed. The performance of the proposed algorithm is assessed using digitized files from a standard database for document compression along with different resolutions. Comparisons against other state-of-the-art algorithms are performed in this manuscript. The results establish quantitatively that the present methodology is a more efficient technique.

Keywords:
Lossy compression Computer science Entropy (arrow of time) Cluster analysis Image compression Lossless compression Data compression Pattern recognition (psychology) Data mining Algorithm Artificial intelligence Image (mathematics) Image processing

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
24
Refs
0.12
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Advanced Data Compression Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Retrieval and Classification Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.