To find all frequent patterns present in a set of strings is computationally intensive. An exhaustive search, where every possible candidate is taken into consideration, is not practical for larger pattern widths due to exponential computational complexity. Other approaches apply heuristics, where algorithm tries to reduce search space, but may compromise the accuracy of results to certain extent. We used modified Apriori algorithm to mine possible patterns in a very long sequence, especially most frequent substring pattern of a fixed length in biological sequence. The algorithm gives good performance by rapid reduction in search space, and computations using bit-wise operations instead of expensive string comparison operations. This algorithm outperform existing pattern finding methods such as MEME in terms of execution time.
S. VijayalakshmiS. Suresh Raja