Privacy-preserving record linkage (PPRL) is the process of identifying records that correspond to the same entities across several databases without revealing any sensitive information about these entities. One popular PPRL technique is Bloom filter (BF) encoding, with first applications of BF based PPRL now being employed in real-world linkage applications. Here we present a cryptanalysis attack that can re-identify attribute values encoded in BFs. Our method applies maximal frequent itemset mining on a BF database to first identify sets of frequently co-occurring bit positions that correspond to encoded frequent q-grams (character substrings extracted from plain-text values). Using a language model, we then identify additional q-grams by applying pattern mining on subsets of BFs that encode a previously identified frequent q-gram. Experiments on a real database show that our attack can successfully re-identify sensitive values even when each BF in a database is unique.
Peter ChristenAnushka VidanageThilina RanbadugeRainer Schnell
Peter ChristenRainer SchnellDinusha VatsalanThilina Ranbaduge
Peter ChristenThilina RanbadugeDinusha VatsalanRainer Schnell
Weifeng YinLifeng YuanYizhi RenWeizhi MengDong WangQiuhua Wang