JOURNAL ARTICLE

AMOM: Adaptive Masking over Masking for Conditional Masked Language Model

Yisheng XiaoRuiyang XuLijun WuJuntao LiTao QinTie‐Yan LiuMin Zhang

Year: 2023 Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Vol: 37 (11)Pages: 13789-13797   Publisher: Association for the Advancement of Artificial Intelligence

Abstract

Transformer-based autoregressive (AR) methods have achieved appealing performance for varied sequence-to-sequence generation tasks, e.g., neural machine translation, summarization, and code generation, but suffer from low inference efficiency. To speed up the inference stage, many non-autoregressive (NAR) strategies have been proposed in the past few years. Among them, the conditional masked language model (CMLM) is one of the most versatile frameworks, as it can support many different sequence generation scenarios and achieve very competitive performance on these tasks. In this paper, we further introduce a simple yet effective adaptive masking over masking strategy to enhance the refinement capability of the decoder and make the encoder optimization easier. Experiments on 3 different tasks (neural machine translation, summarization, and code generation) with 15 datasets in total confirm that our proposed simple method achieves significant performance improvement over the strong CMLM model. Surprisingly, our proposed model yields state-of-the-art performance on neural machine translation (34.62 BLEU on WMT16 EN to RO, 34.82 BLEU on WMT16 RO to EN, and 34.84 BLEU on IWSLT De to En) and even better performance than the AR Transformer on 7 benchmark datasets with at least 2.2x speedup. Our code is available at GitHub.

Keywords:
Computer science Automatic summarization Machine translation Transformer Language model Inference Artificial intelligence Autoregressive model Masking (illustration) Benchmark (surveying) Speedup Speech recognition Encoder Code (set theory) Machine learning Programming language Parallel computing

Metrics

8
Cited By
1.15
FWCI (Field Weighted Citation Impact)
79
Refs
0.74
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.