Jeff MitchellJesús Martínez del RincónNiall McLaughlin
Abstract Many image datasets contain Spurious Correlations (SC), which are coincidental correlations between non-predictive features of the training images and the target label. A classifier trained on such a dataset will appear to perform well when evaluated on the training dataset, but will perform poorly in real-world testing when the spurious correlation is no longer present. This paper investigates the research question of how image classification models can be made robust to the presence of spurious correlations in their training data. To address this challenge, we propose UnLearning from Experience (ULE), a novel student-teacher framework that mitigates SC without requiring group labels. Our method is based on using two classification models trained in parallel: student and teacher models. Both models receive the same batches of training data. The student model is trained with no constraints and pursues the spurious correlations in the data. The teacher model is trained to solve the same classification problem while avoiding the mistakes of the student model. As training is done in parallel, the better the student model learns the spurious correlations, the more robust the teacher model becomes. The teacher model uses the gradient of the student’s output with respect to its input to unlearn mistakes made by the student. Empirically, ULE improves worst-group accuracy by up to 29.0% on Waterbirds, 44.2% on CelebA, 29.4% on Spawrious, and 43.2% on UrbanCars compared to the baseline method.
Misgina Tsighe HagosKathleen M. CurranBrian Mac Namee
Xinyi SunHongye TanDongzhi HanZhichao YanXiaoli LiRu LiHu Zhang
Tina BehniaKe WangChristos Thrampoulidis