We investigate the integration of a planning mechanism into an encoder-decoder architecture with attention. We develop a model that can plan ahead when it computes alignments between the source and target sequences not only for a single time-step but for the next k time-steps as well by constructing a matrix of proposed future alignments and a commitment vector that governs whether to follow or recompute the plan. This mechanism is inspired by strategic attentive reader and writer (STRAW) model, a recent neural architecture for planning with hierarchical reinforcement learning that can also learn higher level temporal abstractions. Our proposed model is end-to-end trainable with differentiable operations. We show that our model outperforms strong baselines on character-level translation task from WMT’15 with fewer parameters and computes alignments that are qualitatively intuitive.
Jason D. LeeKyunghyun ChoThomas Hofmann
Nikola LjubešićTomaž ErjavecDarja Fišer
Zixin ZengRui WangYichong LengJunliang GuoShufang XieXu TanTao QinTie‐Yan Liu