JOURNAL ARTICLE

Efficient Top-k Frequent Itemset Mining on Massive Data

Xiaolong WanXixian Han

Year: 2024 Journal:   Data Science and Engineering Vol: 9 (2)Pages: 177-203   Publisher: Springer Science+Business Media

Abstract

Abstract Top- k frequent itemset mining (top- k FIM) plays an important role in many practical applications. It reports the k itemsets with the highest supports. Rather than the subtle minimum support threshold specified in FIM, top- k FIM only needs the more understandable parameter of the result number. The existing algorithms require at least two passes of scan on the table, and incur high execution cost on massive data. This paper develops a prefix-partitioning-based PTF algorithm to mine top- k frequent itemsets efficiently, where each prefix-based partition keeps the transactions sharing the same prefix item. PTF can skip most of the partitions directly which cannot generate any top- k frequent itemsets. Vertical mining is developed to process the partitions of vertical representation with the high-support-first principle, and only a small fraction of the items are involved in the processing of the partitions. Two improvements are proposed to reduce execution cost further. Hybrid vertical storage mode maintains the prefix-based partitions adaptively and the candidate pruning reduces the number of the explored candidates. The extensive experimental results show that, on massive data, PTF can achieve up to 1348.53 times speedup ratio and involve up to 355.31 times less I/O cost compared with the state-of-the-art algorithms.

Keywords:
Computer science Data mining Information retrieval

Metrics

12
Cited By
18.33
FWCI (Field Weighted Citation Impact)
52
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Data Mining Algorithms and Applications
Physical Sciences →  Computer Science →  Information Systems
Data Management and Algorithms
Physical Sciences →  Computer Science →  Signal Processing
Rough Sets and Fuzzy Logic
Physical Sciences →  Computer Science →  Computational Theory and Mathematics

Related Documents

JOURNAL ARTICLE

Efficient top-k high utility itemset mining on massive data

Xixian HanXianmin LiuJianzhong LiHong Gao

Journal:   Information Sciences Year: 2020 Vol: 557 Pages: 382-406
JOURNAL ARTICLE

Mining Top-K Frequent Closed Itemset in Data Streams

Jun LiHou Xiu-hongSen Gong

Journal:   Energy Procedia Year: 2011 Vol: 11 Pages: 594-601
JOURNAL ARTICLE

Efficient Skyline Frequent-Utility Itemset Mining Algorithm on Massive Data

Jingxuan HeXixian HanXiaolong WanJinbao Wang

Journal:   IEEE Transactions on Knowledge and Data Engineering Year: 2024 Vol: 36 (7)Pages: 3009-3023
© 2026 ScienceGate Book Chapters — All rights reserved.