Luis F. Mun͂oz-PérezJosé Antonio Esquivel GuerreroJorge E. Macías‐Díaz
In this work, we introduce an efficient method for lossy compression of digitalized documents. The method uses a dictionary which consists of class representatives defined using a minimum entropy criterion. The algorithm initially identifies the different symbols contained in a document image, and then the symbols are grouped in classes by means of a hierarchic clustering algorithm. For each class, a representative is selected using the principle of minimum entropy and suitable similarity distances. The technique creates a file in which every object belonging to a class is replaced by its class representative. Finally, the resulting file is compressed. The performance of the proposed algorithm is assessed using digitized files from a standard database for document compression along with different resolutions. Comparisons against other state-of-the-art algorithms are performed in this manuscript. The results establish quantitatively that the present methodology is a more efficient technique.
Amandus KrantzFlorian Westphal
Arko BanerjeeArun K. PujariChhabi Rani PanigrahiBibudhendu Pati
Poornima BehlSantanu ChaudhuryBrejesh Lall
Arthur C. DepoianEthan R. AdamsColleen P. BaileyParthasarathy Guturu