JOURNAL ARTICLE

Efficient Skyline Frequent-Utility Itemset Mining Algorithm on Massive Data

Jingxuan HeXixian HanXiaolong WanJinbao Wang

Year: 2024 Journal:   IEEE Transactions on Knowledge and Data Engineering Vol: 36 (7)Pages: 3009-3023   Publisher: IEEE Computer Society

Abstract

Frequent itemset mining (FIM) and high-utility itemset mining (HUIM) are two important branches of itemset mining which is a key technology of knowledge discovery in many applications. Nowadays, there have been extensive algorithms on FIM and HUIM, but few studies consider frequency and utility together, so skyline frequent-utility itemset mining (SFUIM) is proposed to find useful itemsets with both frequency and utility measurements. Nevertheless, SFUIM is more challenging than FIM and HUIM since the search space is large and the calculation cost is expensive without any threshold, especially on large-scale databases. To address it, this paper proposes a novel prefix-based algorithm PSI* to mine skyline frequent-utility itemsets on massive data. PSI* divides the huge database by prefix-based partitioning, so that the calculation of itemsets with a specific prefix-item only involves a partition instead of the database. A multilevel-index based list is presented to compactly maintain the maximal utility under the frequency constraint, and a novel grid-based structure is devised to organize partitions or items by a designed order. Moreover, four efficient pruning strategies are proposed to prune itemsets as early as possible. Substantial experiments show that the PSI* algorithm has better performance than the state-of-the-art algorithms, obviously on large-scale databases.

Keywords:
Skyline Computer science Data mining Partition (number theory) Key (lock) Pruning Prefix Database Algorithm Mathematics

Metrics

6
Cited By
9.17
FWCI (Field Weighted Citation Impact)
23
Refs
0.95
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Data Mining Algorithms and Applications
Physical Sciences →  Computer Science →  Information Systems
Rough Sets and Fuzzy Logic
Physical Sciences →  Computer Science →  Computational Theory and Mathematics
Data Management and Algorithms
Physical Sciences →  Computer Science →  Signal Processing

Related Documents

JOURNAL ARTICLE

Efficient high-utility occupancy itemset mining algorithm on massive data

Jingxuan HeXixian HanJinbao WangKaiqi Zhang

Journal:   Expert Systems with Applications Year: 2022 Vol: 210 Pages: 118329-118329
JOURNAL ARTICLE

Efficient Top-k Frequent Itemset Mining on Massive Data

Xiaolong WanXixian Han

Journal:   Data Science and Engineering Year: 2024 Vol: 9 (2)Pages: 177-203
JOURNAL ARTICLE

Efficient top-k high utility itemset mining on massive data

Xixian HanXianmin LiuJianzhong LiHong Gao

Journal:   Information Sciences Year: 2020 Vol: 557 Pages: 382-406
© 2026 ScienceGate Book Chapters — All rights reserved.