VLN-Trans: Translator for the Vision and Language Navigation Agent

Yue Zhang; Parisa Kordjamshidi

doi:10.18653/v1/2023.acl-long.737

ScienceGate Book Chapters

JOURNAL ARTICLE

VLN-Trans: Translator for the Vision and Language Navigation Agent

Yue Zhang Parisa Kordjamshidi

Year: 2023 Pages: 13219-13233

DOI: 10.18653/v1/2023.acl-long.737

Get Full-Text PDF Get Analytical Report

Abstract

Language understanding is essential for the navigation agent to follow instructions. We observe two kinds of issues in the instructions that can make the navigation task challenging: 1. The mentioned landmarks are not recognizable by the navigation agent due to the different vision abilities of the instructor and the modeled agent. 2. The mentioned landmarks are applicable to multiple targets, thus not distinctive for selecting the target among the candidate viewpoints.To deal with these issues, we design a translator module for the navigation agent to convert the original instructions into easy-to-follow sub-instruction representations at each step. The translator needs to focus on the recognizable and distinctive landmarks based on the agent’s visual abilities and the observed visual environment.To achieve this goal, we create a new synthetic sub-instruction dataset and design specific tasks to train the translator and the navigation agent.We evaluate our approach on Room2Room (R2R), Room4room (R4R), and Room2Room Last (R2R-Last) datasets and achieve state-of-the-art results on multiple benchmarks.

Keywords:

Viewpoints Computer science Focus (optics) Task (project management) Human–computer interaction Artificial intelligence Computer vision Engineering

Metrics

Cited By

0.91

FWCI (Field Weighted Citation Impact)

Refs

0.70

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

VLN-Trans: Translator for the Vision and Language Navigation Agent

Abstract

Metrics

Citation History

Topics

Related Documents

VLN-ChEnv: Vision-language Navigation in Changeable Environments

VLN↻BERT: A Recurrent Vision-and-Language BERT for Navigation

VLN-Video: Utilizing Driving Videos for Outdoor Vision-and-Language Navigation

UAV-VLN: End-to-End Vision Language guided Navigation for UAVs

VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation