TextFocus: Assessing the Faithfulness of Feature Attribution Methods Explanations in Natural Language Processing

Ettore Mariotti; Anna Arias-Duart; Michele Cafagna; Albert Gatt; Dario García-Gasulla; José M. Alonso

doi:10.1109/access.2024.3408062

ScienceGate Book Chapters

JOURNAL ARTICLE

TextFocus: Assessing the Faithfulness of Feature Attribution Methods Explanations in Natural Language Processing

Ettore Mariotti Anna Arias-Duart Michele Cafagna Albert Gatt Dario García-Gasulla José M. Alonso

Year: 2024 Journal: IEEE Access Pages: 1-1 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/access.2024.3408062

Get Full-Text PDF Get Analytical Report

Abstract

<p>Among the existing eXplainable AI (XAI) approaches, Feature Attribution methods are a popular option due to their interpretable nature. However, each method leads to a different solution, thusintroducing uncertainty regarding their reliability and coherence with respect to the underlying model. Thiswork introduces TextFocus, a metric for evaluating the faithfulness of Feature Attribution methods for NaturalLanguage Processing (NLP) tasks involving classification. To address the absence of ground truth explanationsfor such methods, we introduce the concept of textual mosaics. A mosaic is composed of a combination ofsentences belonging to different classes, which provides an implicit ground truth for attribution. The accuracyof explanations can be then evaluated by comparing feature attribution scores with the known class labels inthe mosaic. The performance of six feature attribution methods is systematically compared on three sentenceclassification tasks by using TextFocus, with Integrated Gradients being the best overall method in terms offaithfulness and computational requirements. The proposed methodology fills a gap in NLP evaluation, byproviding an objective way to assess Feature Attribution methods while finding their optimal parameters.</p>

Keywords:

Computer science Natural language processing Artificial intelligence Authorship attribution Natural (archaeology) Feature (linguistics) Natural language Attribution Linguistics Psychology

Metrics

Cited By

0.64

FWCI (Field Weighted Citation Impact)

Refs

0.65

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Explainable Artificial Intelligence (XAI)

Physical Sciences → Computer Science → Artificial Intelligence

TextFocus: Assessing the Faithfulness of Feature Attribution Methods Explanations in Natural Language Processing

Abstract

Metrics

Citation History

Topics

Related Documents

Empirical Analysis of Methods for Evaluating Faithfulness of Explanations by Feature Attribution

Faithfulness Tests for Natural Language Explanations

Plausibility and Faithfulness of Feature Attribution-Based Explanations in Automated Short Answer Scoring

On Measuring Faithfulness or Self-consistency of Natural Language Explanations

Assessing Natural Language Processing