JOURNAL ARTICLE

Fine-Grained Multi-label Sexism Classification Using a Semi-Supervised Multi-level Neural Approach

Harika AbburiPulkit ParikhNiyati ChhayaVasudeva Varma

Year: 2021 Journal:   Data Science and Engineering Vol: 6 (4)Pages: 359-379   Publisher: Springer Science+Business Media

Abstract

Abstract Sexism, a permeate form of oppression, causes profound suffering through various manifestations. Given the increasing number of experiences of sexism shared online, categorizing these recollections automatically can support the battle against sexism, since it can promote successful evaluations by gender studies researchers and government representatives engaged in policy making. In this paper, we examine the fine-grained, multi-label classification of accounts (reports) of sexism. To the best of our knowledge, we consider substantially more categories of sexism than any related prior work through our 23-class problem formulation. Moreover, we present the first semi-supervised work for the multi-label classification of accounts describing any type(s) of sexism. We devise self-training-based techniques tailor-made for the multi-label nature of the problem to utilize unlabeled samples for augmenting the labeled set. We identify high textual diversity with respect to the existing labeled set as a desirable quality for candidate unlabeled instances and develop methods for incorporating it into our approach. We also explore ways of infusing class imbalance alleviation for multi-label classification into our semi-supervised learning, independently and in conjunction with the method involving diversity. In addition to data augmentation methods, we develop a neural model which combines biLSTM and attention with a domain-adapted BERT model in an end-to-end trainable manner. Further, we formulate a multi-level training approach in which models are sequentially trained using categories of sexism of different levels of granularity. Moreover, we devise a loss function that exploits any label confidence scores associated with the data. Several proposed methods outperform various baselines on a recently released dataset for multi-label sexism categorization across several standard metrics.

Keywords:
Computer science Artificial intelligence Machine learning Class (philosophy) Set (abstract data type) Multi-label classification Natural language processing

Metrics

21
Cited By
2.26
FWCI (Field Weighted Citation Impact)
46
Refs
0.90
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Hate Speech and Cyberbullying Detection
Physical Sciences →  Computer Science →  Artificial Intelligence
Cancer-related gene regulation
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
Authorship Attribution and Profiling
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.