BOOK-CHAPTER

Hashing-Based Hybrid Duplicate Detection for Bayesian Network Structure Learning

Niklas JahnssonBrandon MalonePetri Myllymäki

Year: 2015 Lecture notes in computer science Pages: 46-60   Publisher: Springer Science+Business Media

Abstract

In this work, we address the well-known score-based Bayesian network structure learning problem. Breadth-first branch and bound (BFBnB) has been shown to be an effective approach for solving this problem. Delayed duplicate detection (DDD) is an important component of the BFBnB algorithm. Previously, an external sorting-based technique, with complexity $${\text {O}}\left( m \log m\right) $$ , where m is the number of nodes stored in memory, was used for DDD. In this work, we propose a hashing-based technique, with complexity $${\text {O}}\left( m\right) $$ , for DDD. In practice, by removing the $${\text {O}}\left( \log m\right) $$ overhead of sorting, over an order of magnitude more memory is available for the search. Empirically, we show the extra memory improves locality and decreases the amount of expensive external memory operations. We also give a bin packing algorithm for minimizing the number of external memory files.

Keywords:
Computer science Auxiliary memory Sorting Hash function Overhead (engineering) Bin Algorithm Upper and lower bounds Memory management Data structure Bayesian network Locality-sensitive hashing Theoretical computer science Artificial intelligence Hash table Overlay Mathematics

Metrics

1
Cited By
0.33
FWCI (Field Weighted Citation Impact)
39
Refs
0.60
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Bayesian Modeling and Causal Inference
Physical Sciences →  Computer Science →  Artificial Intelligence
Data Quality and Management
Social Sciences →  Decision Sciences →  Management Science and Operations Research
Data Management and Algorithms
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.