Given a machine learning model and a record, membership inference attacks determine whether this record was used as part of the model's training dataset. This can raise privacy issues. There is a desideratum to provide robust mitigation techniques against this attack that will not affect utility. One of the state-of-the-art frameworks in this area is SELENA, which has two phases: Split-AI and Self-Distillation to train a protected model. In this paper, we introduce a novel approach to the Split-AI phase, which tries to weaken the membership inference by using the Jacobian matrix norm and entropy. We experimentally demonstrate that our approach can decrease the memorization of the machine-learning model for three datasets: Purchase100, CIFAR-10, and SVHN, more than SELENA in the same range of utility in a setting in which we do not know any member of the training data.
Teodora BalutaShiqi ShenS. HitarthShruti ToplePrateek Saxena
Zhifeng KongAmrita Roy ChowdhuryKamalika Chaudhuri