JOURNAL ARTICLE

Analysis of sampling techniques for association rule mining

Abstract

In this paper, we present a comprehensive theoretical analysis of the sampling technique for the association rule mining problem. Most of the previous works have concentrated only on the empirical evaluation of the effectiveness of sampling for the step of finding frequent itemsets. To the best of our knowledge, a theoretical framework to analyze the quality of the solutions obtained by sampling has not been studied. Our contributions are two-fold. First, we present the notions of ɛ-close frequent itemset mining and ɛ-close association rule mining that help assess the quality of the solutions obtained by sampling. Secondly, we show that both the frequent items mining and association rule mining problems can be solved satisfactorily with a sample size that is independent of both the number of transactions size and the number of items. Let θ be the required support, ɛ the closeness parameter, and 1/h the desired bound on the probability of failure. We show that the sampling based analysis succeeds in solving both ɛ-close frequent itemset mining and ɛ-close association rule mining with a probability of at least (1 − 1/h) with a sample of size S = O ( 1 ɛ2θ [∆+log h (1−ɛ)θ]), where ∆ is the maximum number of items present in any transaction. Thus, we establish that it is possible to speed up the entire process of association rule mining for massive databases by working with a small sample while retaining any desired degree of accuracy. Our work gives a comprehensive explanation for the well known empirical successes of sampling for the step of finding frequent items in association rule mining.

Keywords:
Association rule learning Computer science Data mining Association (psychology) Sampling (signal processing) Artificial intelligence Psychology Telecommunications

Metrics

64
Cited By
14.31
FWCI (Field Weighted Citation Impact)
11
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Data Mining Algorithms and Applications
Physical Sciences →  Computer Science →  Information Systems
Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Rough Sets and Fuzzy Logic
Physical Sciences →  Computer Science →  Computational Theory and Mathematics

Related Documents

JOURNAL ARTICLE

Sampling in association rule mining

Tsau Young Lin

Journal:   Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE Year: 2004 Vol: 5433 Pages: 161-161
JOURNAL ARTICLE

A new sampling technique for association rule mining

Basel A. MahafzahAmer Al‐BadarnehMohammed Z. Zakaria

Journal:   Journal of Information Science Year: 2009 Vol: 35 (3)Pages: 358-376
© 2026 ScienceGate Book Chapters — All rights reserved.