JOURNAL ARTICLE

VLN↻BERT: A Recurrent Vision-and-Language BERT for Navigation

Abstract

Accuracy of many visiolinguistic tasks has benefited significantly from the application of vision-and-language (V&L) BERT. However, its application for the task of vision-and-language navigation (VLN) remains limited. One reason for this is the difficulty adapting the BERT architecture to the partially observable Markov decision process present in VLN, requiring history-dependent attention and decision making. In this paper we propose a recurrent BERT model that is time-aware for use in VLN. Specifically, we equip the BERT model with a recurrent function that maintains cross-modal state information for the agent. Through extensive experiments on R2R and REVERIE we demonstrate that our model can replace more complex encoder-decoder models to achieve state-of-the-art results. Moreover, our approach can be generalised to other transformer-based architectures, supports pre-training, and is capable of solving navigation and referring expression tasks simultaneously.

Keywords:
Computer science Encoder Transformer Architecture Artificial intelligence Process (computing) Language model Modal Markov decision process Markov process Human–computer interaction Computer vision Programming language Engineering

Metrics

204
Cited By
15.95
FWCI (Field Weighted Citation Impact)
97
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

A Recurrent Vision-and-Language BERT for Navigation

Yicong Hong

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2021
JOURNAL ARTICLE

A Recurrent Vision-and-Language BERT for Navigation

Yicong Hong

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2021
BOOK-CHAPTER

Improved VLN-BERT with Reinforcing Endpoint Alignment for Vision-and-Language Navigation

Chuan JinBoyuan YangRuonan Liu

Communications in computer and information science Year: 2024 Pages: 119-133
© 2026 ScienceGate Book Chapters — All rights reserved.