JOURNAL ARTICLE

Feature selection in text classification: Identifying spurious words with causal inference methods

Abstract

As has been scrutinized by many, non-causal model may contain spurious correlations that act like shortcuts during the prediction phase, undermining cross-domain accuracy. This can be caused by biased training data that contains spurious words with neutral meanings yet can induce the model to predict wrongly. Based on this assumption, we propose a series of methods to detect these spurious words before feeding the model with the training data. We used advanced causal inference methods which are arising novas in recent studies, such as propensity score matching and inverse propensity score weighting to facilitate the feature selection before training. We experimented with multiple approaches to estimate propensity scores and got profound improvements. We further experimented with BERT model to evaluate the effectiveness of feature selection and find that the model performance with in-domain and out-of-domain testing samples is boosted after we remove the spurious words detected by our methods in the training data.

Keywords:
Spurious relationship Computer science Inference Causal inference Artificial intelligence Weighting Feature selection Feature (linguistics) Machine learning Propensity score matching Matching (statistics) Selection (genetic algorithm) Model selection Selection bias Pattern recognition (psychology) Data mining Statistics Mathematics

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
25
Refs
0.12
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Causal Inference Techniques
Physical Sciences →  Mathematics →  Statistics and Probability

Related Documents

JOURNAL ARTICLE

Redundant Feature Selection Methods in Text Classification

Su Fen Chen

Journal:   Advanced materials research Year: 2014 Vol: 1044-1045 Pages: 1258-1261
JOURNAL ARTICLE

Text Classification using KNN with different Feature Selection Methods

Rajshree JodhaGaur Sanjay B.CK. R. Chowdhary

Journal:   International Journal of Research Publications Year: 2018 Vol: 09 (1)Pages: 8-8
© 2026 ScienceGate Book Chapters — All rights reserved.