JOURNAL ARTICLE

Pattern Matching in LZW Compressed Files

Tao TaoAnupam Mukherjee

Year: 2005 Journal:   IEEE Transactions on Computers Vol: 54 (8)Pages: 929-938   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Compressed pattern matching is an emerging research area that addresses the following problem: Given a text file in compressed format and a pattern, report the occurrence(s) of the pattern in the file with minimal (or no) decompression. In this paper, we report our work on compressed pattern matching in LZW compressed files. The work includes an extension of Amir et al.'s well-known "almost-optimal" algorithm. The original Amir et al.'s algorithm has been improved to search not only the first occurrence of the pattern but also all other occurrences. A faster implementation for so-called "simple patterns" is also proposed. The work also includes a novel multiple-pattern matching algorithm using the Aho-Corasick algorithm. The algorithm takes O(mt+n+r) time with O(mt) extra space, where n is the size of the compressed file, m is the total length of all patterns, t is the size of the LZW trie, and r is the number of occurrences of the patterns. Extensive experiments have been conducted to test the performance of our algorithms and to compare with other well-known compressed pattern matching algorithms, particularly the BWT-based algorithms and another similar multiple-pattern matching algorithm by Kida et al. that also uses the Aho-Corasick algorithm on the LZW compressed data. The results showed that our multiple-pattern matching algorithm is competitive among the best compressed pattern-matching algorithms and is practically the fastest among all approaches when the number of patterns is not very large. Therefore, our algorithm is preferable for general string matching applications. The proposed algorithm is efficient for large files and it is particularly efficient when being applied on archive search if the archives are compressed with a common LZW trie. LZW is one of the most efficient and popular compression algorithms used extensively and our method requires no modification on the compression algorithm. The work reported in this paper, therefore, has great economic and market potential.

Keywords:
Pattern matching Computer science Trie Matching (statistics) Algorithm String searching algorithm Pattern recognition (psychology) Data structure Mathematics Artificial intelligence

Metrics

19
Cited By
0.77
FWCI (Field Weighted Citation Impact)
15
Refs
0.80
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Algorithms and Data Compression
Physical Sciences →  Computer Science →  Artificial Intelligence
Network Packet Processing and Optimization
Physical Sciences →  Computer Science →  Hardware and Architecture
Advanced Data Storage Technologies
Physical Sciences →  Computer Science →  Computer Networks and Communications

Related Documents

JOURNAL ARTICLE

Multiple-pattern matching for LZW compressed files

Tao TaoAmar Mukherjee

Year: 2005 Vol: 1645 Pages: 91-96 Vol. 1
JOURNAL ARTICLE

Let sleeping files lie: pattern matching in Z-compressed files

Amihood AmirGary BensonMartı́n Farach-Colton

Journal:   Symposium on Discrete Algorithms Year: 1994 Pages: 705-714
JOURNAL ARTICLE

Let Sleeping Files Lie: Pattern Matching in Z-Compressed Files

Amihood AmirGary BensonMartı́n Farach-Colton

Journal:   Journal of Computer and System Sciences Year: 1996 Vol: 52 (2)Pages: 299-307
JOURNAL ARTICLE

Multiple-Pattern Matching In LZW Compressed Files Using Aho-Corasick Algorithm

Tao TaoAmar Mukherjee

Journal:   Data Compression Conference Year: 2005 Pages: 482-482
BOOK-CHAPTER

Compressed Pattern Matching

Masayuki Takeda

Encyclopedia of Algorithms Year: 2008 Pages: 171-174
© 2026 ScienceGate Book Chapters — All rights reserved.