Learning Label-Adaptive Representation for Large-Scale Multi-Label Text Classification

Peng Cheng; Haobo Wang; Jue Wang; Lidan Shou; Ke Chen; Gang Chen; Chang Yao

doi:10.1109/taslp.2024.3393722

ScienceGate Book Chapters

JOURNAL ARTICLE

Learning Label-Adaptive Representation for Large-Scale Multi-Label Text Classification

Peng Cheng Haobo Wang Jue Wang Lidan Shou Ke Chen Gang Chen Chang Yao

Year: 2024 Journal: IEEE/ACM Transactions on Audio Speech and Language Processing Vol: 32 Pages: 2630-2640 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/taslp.2024.3393722

Get Full-Text PDF Get Analytical Report

Abstract

Large-scale multi-label text classification (LMTC) aims at tagging each text with multiple relevant labels from a large label space, which typically demonstrates high sparsity, diversity, and skewness. To learn text representations in LMTC, a straightforward strategy is to learn a single vector to represent the whole text, yet limiting good generalization to diverse labels; another popular one is to learn specific representation per label via attention weighting, but excessively emphasizing tail labels restricts the overall performance. To cope with these limitations, we propose a novel LMTC framework, dubbed LADAR, which learns label-adaptive text representations to ensure high performance on large-scale labels. Specifically, we construct a representation pool for each text by collecting multi-layer features of the deep model as well as multi-granularity features of the text. Furthermore, all labels are adaptively matched to their most relevant representations to predict the final scores. Experiments over five benchmark datasets demonstrate the LADAR achieves highly superior results to state-of-the-art LMTC approaches. In particular, LADAR achieves significantly better performance on tail labels, e.g., 5.09% relative improvement on PSP@5 on the Amazon-670K dataset than the best baseline.

Keywords:

Computer science Artificial intelligence Granularity Construct (python library) Representation (politics) Benchmark (surveying) Weighting Generalization Pattern recognition (psychology) Machine learning Scale (ratio) Mathematics

Metrics

Cited By

1.92

FWCI (Field Weighted Citation Impact)

Refs

0.81

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Text and Document Classification Technologies

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Text Analysis Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Learning Label-Adaptive Representation for Large-Scale Multi-Label Text Classification

Abstract

Metrics

Citation History

Topics

Related Documents

Label-Aware Text Representation for Multi-Label Text Classification

Enhancing Representation Learning with Label Association for Multi-Label Text Classification

Meta-LMTC: Meta-Learning for Large-Scale Multi-Label Text Classification

Label-Specific Document Representation for Multi-Label Text Classification

Large-Scale Multi-label Text Classification — Revisiting Neural Networks