BOOK-CHAPTER

Distributed Association Rule Mining

Abstract

Data mining is an iterative and interactive process that explores and analyzes voluminous digital data to discover valid, novel, and meaningful patterns (Mohammed, 1999). Since digital data may have terabytes of records, data mining techniques aim to find patterns using computationally efficient techniques. It is related to a subarea of statistics called exploratory data analysis. During the past decade, data mining techniques have been used in various business, government, and scientific applications. Association rule mining (Agrawal, Imielinsky & Sawmi, 1993) is one of the most studied fields in the data-mining domain. The key strength of association mining is completeness. It has the ability to discover all associations within a given dataset. Two important constraints of association rule mining are support and confidence (Agrawal & Srikant, 1994). These constraints are used to measure the interestingness of a rule. The motivation of association rule mining comes from market-basket analysis that aims to discover customer purchase behavior. However, its applications are not limited only to market-basket analysis; rather, they are used in other applications, such as network intrusion detection, credit card fraud detection, and so forth. The widespread use of computers and the advances in network technologies have enabled modern organizations to distribute their computing resources among different sites. Various business applications used by such organizations normally store their day-to-day data in each respective site. Data of such organizations increases in size everyday. Discovering useful patterns from such organizations using a centralized data mining approach is not always feasible, because merging datasets from different sites into a centralized site incurs large network communication costs (Ashrafi, David & Kate, 2004). Furthermore, data from these organizations are not only distributed over various locations, but are also fragmented vertically. Therefore, it becomes more difficult, if not impossible, to combine them in a central location. Therefore, Distributed Association Rule Mining (DARM) emerges as an active subarea of data-mining research. Consider the following example. A supermarket may have several data centers spread over various regions across the country. Each of these centers may have gigabytes of data. In order to find customer purchase behavior from these datasets, one can employ an association rule mining algorithm in one of the regional data centers. However, employing a mining algorithm to a particular data center will not allow us to obtain all the potential patterns, because customer purchase patterns of one region will vary from the others. So, in order to achieve all potential patterns, we rely on some kind of distributed association rule mining algorithm, which can incorporate all data centers. Distributed systems, by nature, require communication. Since distributed association rule mining algorithms generate rules from different datasets spread over various geographical sites, they consequently require external communications in every step of the process (Ashrafi, David & Kate, 2004; Assaf & Ron, 2002; Cheung, Ng, Fu & Fu, 1996). As a result, DARM algorithms aim to reduce communication costs in such a way that the total cost of generating global association rules must be less than the cost of combining datasets of all participating sites into a centralized site. Request access from your librarian to read this chapter's full text.

Keywords:
Association rule learning Association (psychology) Computer science Data mining Psychology

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.40
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Data Mining Algorithms and Applications
Physical Sciences →  Computer Science →  Information Systems

Related Documents

BOOK-CHAPTER

Distributed Association Rule Mining

Mafruz Zaman AshrafiDavid TaniarKate Smith‐Miles

IGI Global eBooks Year: 2005 Pages: 403-407
BOOK-CHAPTER

Distributed Association Rule Mining

Mafruz Zaman Ashrafi

IGI Global eBooks Year: 2009 Pages: 695-700
BOOK-CHAPTER

Towards Distributed Association Rule Mining Privacy

Mafruz Zaman AshrafiDavid TaniarKate Smith‐Miles

Advances in intelligent information technologies series/Advances in intelligent information technologies (AIIT) book series Year: 2007 Pages: 245-271
© 2026 ScienceGate Book Chapters — All rights reserved.