Fei WangLong ChenFei XieCai XuGuangyue Lu
Text classification is a longstanding research topic in natural language processing (NLP). Deep learning has emerged as an effective paradigm for solving text classification problems. However, the performance of a deep model is heavily reliant on large-scale human-annotated data. In this paper, we propose a Semi-Supervised Contrastive Learning (SSCL) framework for text classification, which can significantly improve the performance of deep models in the case of limited labeled data. The proposed framework consists of two components: a pseudo label generation strategy and a contrastive learning scheme for text classification. We first devise a prompt-based strategy for training Bidirectional Encoder Representation from Transformers (BERT), with a small amount of human-labeled data, to obtain a task-correlation model capable of generating pseudo labels for unlabeled text. Then, for the text classification task, we use a two-step contrastive learning scheme: pre-training a deep model with pseudo labels as supervision to capture inter-class patterns while mitigating the negative impact of pseudo label noise, and then fine-tuning the pre-trained model with human-labeled data using a supervised contrastive learning approach. Benefiting from the generated pseudo labels and anti-noise contrastive pre-training, we only use a small amount of labeled data during the training process for the downstream text classification tasks. Experimental results on the twitter sentiment classification dataset and the aspect classification dataset show that our method significantly outperforms baseline methods in a few-shot setting.
Ziyuan MaQiuyan WangYan YangHanning Chen
Chuanyao ZhangJianzong WangZhangcheng HuangLingwei KongXiaoyang QuNing ChengJing Xiao
Hongfeng HanNanyi FeiZhiwu LuJi-Rong Wen
Zhen TanKaize DingRuocheng GuoHuan Liu
Shaoshuai LuLong ChenWenjing WangCai XuWei ZhaoZiyu GuanGuangyue Lu