Fine-Grained Multi-label Sexism Classification Using a Semi-Supervised Multi-level Neural Approach

Harika Abburi; Pulkit Parikh; Niyati Chhaya; Vasudeva Varma

doi:10.1007/s41019-021-00168-y

ScienceGate Book Chapters

JOURNAL ARTICLE

Fine-Grained Multi-label Sexism Classification Using a Semi-Supervised Multi-level Neural Approach

Harika Abburi Pulkit Parikh Niyati Chhaya Vasudeva Varma

Year: 2021 Journal: Data Science and Engineering Vol: 6 (4)Pages: 359-379 Publisher: Springer Science+Business Media

DOI: 10.1007/s41019-021-00168-y

Get Full-Text PDF Get Analytical Report

Abstract

Abstract Sexism, a permeate form of oppression, causes profound suffering through various manifestations. Given the increasing number of experiences of sexism shared online, categorizing these recollections automatically can support the battle against sexism, since it can promote successful evaluations by gender studies researchers and government representatives engaged in policy making. In this paper, we examine the fine-grained, multi-label classification of accounts (reports) of sexism. To the best of our knowledge, we consider substantially more categories of sexism than any related prior work through our 23-class problem formulation. Moreover, we present the first semi-supervised work for the multi-label classification of accounts describing any type(s) of sexism. We devise self-training-based techniques tailor-made for the multi-label nature of the problem to utilize unlabeled samples for augmenting the labeled set. We identify high textual diversity with respect to the existing labeled set as a desirable quality for candidate unlabeled instances and develop methods for incorporating it into our approach. We also explore ways of infusing class imbalance alleviation for multi-label classification into our semi-supervised learning, independently and in conjunction with the method involving diversity. In addition to data augmentation methods, we develop a neural model which combines biLSTM and attention with a domain-adapted BERT model in an end-to-end trainable manner. Further, we formulate a multi-level training approach in which models are sequentially trained using categories of sexism of different levels of granularity. Moreover, we devise a loss function that exploits any label confidence scores associated with the data. Several proposed methods outperform various baselines on a recently released dataset for multi-label sexism categorization across several standard metrics.

Keywords:

Computer science Artificial intelligence Machine learning Class (philosophy) Set (abstract data type) Multi-label classification Natural language processing

Metrics

Cited By

2.26

FWCI (Field Weighted Citation Impact)

Refs

0.90

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Hate Speech and Cyberbullying Detection

Physical Sciences → Computer Science → Artificial Intelligence

Cancer-related gene regulation

Life Sciences → Biochemistry, Genetics and Molecular Biology → Molecular Biology

Authorship Attribution and Profiling

Physical Sciences → Computer Science → Artificial Intelligence

Fine-Grained Multi-label Sexism Classification Using a Semi-Supervised Multi-level Neural Approach

Abstract

Metrics

Citation History

Topics

Related Documents

Fine-grained Multi-label Sexism Classification Using Semi-supervised Learning

Semi-supervised Multi-task Learning for Multi-label Fine-grained Sexism Classification

Semi-supervised Multi-label Classification

Semi-supervised multi-label classification using incomplete label information

Robust multi-label semi-supervised classification