Munkhtulga BattogtokhG. FluckeCosmin DavidescuRita Borgo
Fine-grained text classification with similar and many labels is a challenge in practical applications. Interpreting predictions in this context is particularly difficult. To address this, we propose a simple framework that disentangles feature importance into more fine-grained links. We demonstrate our framework on the task of intent recognition, which is widely used in real-life applications where trustworthiness is important, for state-of-the-art Transformer language models using their attention mechanism. Our human and semi-automated evaluations show that our approach better explains fine-grained input-label relations than popular feature importance estimation methods LIME and Integrated Gradient and that our approach allows faithful interpretations through simple rules, especially when model confidence is high.
Zhenhuan HuangXiaoyue DuanBo ZhaoJinhu LüBaochang Zhang
Yi XiaoKuan HuangSihua NiuJianhua Huang
Sezer KaraoğluRan TaoJan C. van GemertTheo Gevers