Peng ChengHaobo WangJue WangLidan ShouKe ChenGang ChenChang Yao
Large-scale multi-label text classification (LMTC) aims at tagging each text with multiple relevant labels from a large label space, which typically demonstrates high sparsity, diversity, and skewness. To learn text representations in LMTC, a straightforward strategy is to learn a single vector to represent the whole text, yet limiting good generalization to diverse labels; another popular one is to learn specific representation per label via attention weighting, but excessively emphasizing tail labels restricts the overall performance. To cope with these limitations, we propose a novel LMTC framework, dubbed LADAR, which learns label-adaptive text representations to ensure high performance on large-scale labels. Specifically, we construct a representation pool for each text by collecting multi-layer features of the deep model as well as multi-granularity features of the text. Furthermore, all labels are adaptively matched to their most relevant representations to predict the final scores. Experiments over five benchmark datasets demonstrate the LADAR achieves highly superior results to state-of-the-art LMTC approaches. In particular, LADAR achieves significantly better performance on tail labels, e.g., 5.09% relative improvement on PSP@5 on the Amazon-670K dataset than the best baseline.
Wang ZhangXin WangYuhong WuXingpeng ZhangHuayi Zhan
Ran WangXiao Hong SuSiyu LongXinyu DaiShujian HuangJiajun Chen
Lin XiaoXin HuangBoli ChenLiping Jing
Jinseok NamJungi KimEneldo Loza MencíaIryna GurevychJohannes Fürnkranz