ReGen: Zero-Shot Text Classification via Training Data Generation with Progressive Dense Retrieval

Yue Yu; Yuchen Zhuang; Rongzhi Zhang; Meng Yu; Jiaming Shen; Chao Zhang

doi:10.18653/v1/2023.findings-acl.748

ScienceGate Book Chapters

JOURNAL ARTICLE

ReGen: Zero-Shot Text Classification via Training Data Generation with Progressive Dense Retrieval

Yue Yu Yuchen Zhuang Rongzhi Zhang Meng Yu Jiaming Shen Chao Zhang

Year: 2023

DOI: 10.18653/v1/2023.findings-acl.748

Get Full-Text PDF Get Analytical Report

Abstract

With the development of large language models (LLMs), zero-shot learning has attracted much attention for various NLP tasks. Different from prior works that generate training data with billion-scale natural language generation (NLG) models, we propose a retrieval-enhanced framework to create training data from a general-domain unlabeled corpus. To realize this, we first conduct contrastive pretraining to learn an unsupervised dense retriever for extracting the most relevant documents using class-descriptive verbalizers. We then further pro- pose two simple strategies, namely Verbalizer Augmentation with Demonstrations and Self- consistency Guided Filtering to improve the topic coverage of the dataset while removing noisy examples. Experiments on nine datasets demonstrate that ReGen achieves 4.3% gain over the strongest baselines and saves around 70% of the time when compared with baselines using large NLG models. Besides, REGEN can be naturally integrated with recently proposed large language models to boost performance.

Keywords:

Computer science Consistency (knowledge bases) Labeled data Artificial intelligence Natural language generation Training (meteorology) Class (philosophy) Training set Natural language processing Machine learning Simple (philosophy) Domain (mathematical analysis) Language model Natural language

Metrics

Cited By

3.32

FWCI (Field Weighted Citation Impact)

Refs

0.91

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Text Readability and Simplification

Physical Sciences → Computer Science → Artificial Intelligence

ReGen: Zero-Shot Text Classification via Training Data Generation with Progressive Dense Retrieval

Abstract

Metrics

Citation History

Topics

Related Documents

Zero-Shot Text Classification with Self-Training

LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval

Zero-shot Text Classification via Reinforced Self-training

Generation-driven Contrastive Self-training for Zero-shot Text Classification with Instruction-following LLM

CONVERSER: Few-shot Conversational Dense Retrieval with Synthetic Data Generation