Mine Melodi CaliskanSaeed GhoorchianSetareh Maghsudi
State-adversarial perturbations—arising from sensor spoofing, environmental interference, or targeted attacks—corrupt observations and invalidate state-wise optimality assumptions commonly made in IRL. We study inverse reinforcement learning (IRL) in state-adversarial MDPs (SA-MDPs) where only perturbed states are observable and propose SAMM-IRL, a max-margin IRL framework that operates purely in the belief (perturbed) space without access to clean states. In contrast to point-wise, state-wise optimality, we adopt a robust optimality notion based on the expected return over the initial-state distribution, which is well-posed under adversarial observation mappings. We prove (i) the existence of robust optimal policies in SA-MDPs, (ii) the contraction properties of intermediate RL operators under fixed and adaptive adversaries, and the iteration bounds for SAMM-IRL max-margin updates in belief space. Empirically, in discrete GridWorld and continuous control, SAMM-IRL achieves stronger reward recovery and imitation performance under adversarial observations than baselines, while maintaining stable policy updates. We further report perturbation parameters and ablation results in the main text to support reproducibility and practical deployment.
Jiacheng YangY. WangLu DongLei XueChangyin Sun
Zeyang LiChuxiong HuShengbo Eben LiJia ChengYunan Wang
Lerrel PintoJames DavidsonRahul SukthankarAbhinav Gupta
Peng ZhaiXiaoyi WeiTaixian HouXiaopeng JiZhiyan DongJiafu YiLihua Zhang