Niklas JahnssonBrandon MalonePetri Myllymäki
In this work, we address the well-known score-based Bayesian network structure learning problem. Breadth-first branch and bound (BFBnB) has been shown to be an effective approach for solving this problem. Delayed duplicate detection (DDD) is an important component of the BFBnB algorithm. Previously, an external sorting-based technique, with complexity $${\text {O}}\left( m \log m\right) $$ , where m is the number of nodes stored in memory, was used for DDD. In this work, we propose a hashing-based technique, with complexity $${\text {O}}\left( m\right) $$ , for DDD. In practice, by removing the $${\text {O}}\left( \log m\right) $$ overhead of sorting, over an order of magnitude more memory is available for the search. Empirically, we show the extra memory improves locality and decreases the amount of expensive external memory operations. We also give a bin packing algorithm for minimizing the number of external memory files.
Niklas JahnssonBrandon MalonePetri Myllymäki
Sonu JoseSushil J. LouisSergiu M. DascaluSiming Liu
Shan AnZhibiao HuangYu ChenDennis Weng
Hao WangYiwei ChenPeng WangXiaoyan LiRuohai Di
Xingping SunChang ChenLu WangHongwei KangYong ShenQingyi Chen