JOURNAL ARTICLE

Isotropy-Enhanced Conditional Masked Language Models

Abstract

Non-autoregressive models have been widely used for various text generation tasks to accelerate the inference process but at the cost of generation quality to some extent. To achieve a good balance between inference speedup and generation quality, iterative NAR models like CMLM and Disco are proposed. Researchers have made much follow-up progress based on them, and some recent iterative models can achieve very promising performance while maintaining significant speedup. In this paper, we give more insights into iterative NAR models by exploring the anisotropic problem, i.e., the representations of distinct predicted target tokens are similar and indiscriminative. Upon the confirmation of the anisotropic problem in iterative NAR models, we first analyze the effectiveness of the contrastive learning method and further propose the Look Neighbors strategy to enhance the learning of token representations during training. Experiments on 4 WMT datasets show that our methods consistently improve the performance as well as alleviate the anisotropic problem of the conditional masked language model, even outperforming the current SoTA result on WMT14 EN → DE.

Keywords:
Speedup Computer science Inference Iterative and incremental development Process (computing) Artificial intelligence Language model Autoregressive model Security token Quality (philosophy) Machine learning Iterative method Algorithm Mathematics

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
46
Refs
0.17
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Universal Sentence Representation Learning with Conditional Masked Language Model

Ziyi YangYinfei YangDaniel CerJax LawEric Darve

Journal:   Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing Year: 2021
JOURNAL ARTICLE

AMOM: Adaptive Masking over Masking for Conditional Masked Language Model

Yisheng XiaoRuiyang XuLijun WuJuntao LiTao QinTie‐Yan LiuMin Zhang

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2023 Vol: 37 (11)Pages: 13789-13797
JOURNAL ARTICLE

Universal Conditional Masked Language Pre-training for Neural Machine Translation

Pengfei LiLiangyou LiMeng ZhangMinghao WuQun Liu

Journal:   Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Year: 2022 Pages: 6379-6391
© 2026 ScienceGate Book Chapters — All rights reserved.