Condensed local differential privacy is a metric‑based relaxation of local differential privacy with better utility and flexibility than local differential privacy. However, existing solutions are deficient in terms of sequence pattern capture and utility. To address these limitations, this paper proposes SCM‑CLDP, a novel sequential data collection method based on condensed local differential privacy. SCM‑CLDP fully takes into account important information such as the length and transitions of sequential data during the collection process, through which the data collector is able to synthesize privacy‑preserving dataset close to the original dataset. Specifically, according to different perturbation objects, we propose two collection methods, SCM‑VP based on value perturbation and SCM‑TP based on transition perturbation, respectively. We theoretically prove that SCM‑VP and SCM‑TP satisfy sequence‑level condensed local differential privacy, and comparative experiments are conducted with existing solutions based on two real datasets in terms of Markov chain model accuracy, synthetic dataset utility, and frequent sequence pattern mining accuracy. The results show that SCM‑CLDP performs significantly better than the existing solutions, with SCM‑VP outperforming SCM‑TP in most cases. In the optimal situation, SCM‑CLDP reduces the error of the Markov chain model and the distribution of the synthetic dataset by at least one order of magnitude compared to the existing method. Meanwhile, SCM‑CLDP improves the accuracy of item frequency ranking of the synthetic dataset and the accuracy of frequent sequence pattern mining by nearly 30% compared to existing solutions.
Yuanyuan HeFayao WangXianjun DengJianbing NiJun FengShenghao Liu
Junpeng ZhangHui ZhuJiaqi ZhaoRongxing LuYandong ZhengJiezhen TangHui Li
Mehmet Emre GürsoyAcar TamersoyStacey TruexWenqi WeiLing Liu
Yuemin ZhangQingqing YeRui ChenHaibo HuQilong Han