Visual Navigation for Embodied Agents Using Semantic-Based Multi-Modal Cognitive Graph

Qiming Liu; Xinmin Du; Zhe Liu; Hesheng Wang

doi:10.1109/tip.2025.3637722

ScienceGate Book Chapters

JOURNAL ARTICLE

Visual Navigation for Embodied Agents Using Semantic-Based Multi-Modal Cognitive Graph

Qiming Liu Xinmin Du Zhe Liu Hesheng Wang

Year: 2025 Journal: IEEE Transactions on Image Processing Vol: 34 Pages: 7989-8001 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/tip.2025.3637722

Get Full-Text PDF Get Analytical Report

Abstract

Visual navigation is fundamental for embodied agents operating in expansive workspaces. The cognitive abilities of these agents form the essential basis for creating intelligent behavioral patterns. Memory and reasoning are vital components among these abilities. The former enhances decision-making by preserving a wide array of episodic spatio-temporal perception cues, while the latter allows proactive and advanced probabilistic inference of task distributions based on long-term experiences. Despite individual studies on these two cognitive modalities, their integration for enhanced decision-making presents a considerable challenge due to their substantial differences in representation and behavioral characteristics. In this paper, we introduce Semantic-based Multi-modal Cognitive Graph (SMCG) for intelligent visual navigation. This framework is distinguished by its unified semantic-level representation of both memory and reasoning capabilities. Specifically, SMCG, rather than directly memorizing perceptual features as per previous methods, records observed object sequences. Simultaneously, reasoning is based on a semantic relation graph that represents correlations among objects. We additionally develop a hierarchical cognition extraction (HCE) pipeline and employ it to decode cognitive cues within SMCG and situation-aware subgraphs, thereby enhancing intelligent navigation behavior. Experimental results in image-goal navigation show pronounced performance improvements, credited to the effective induction and rational application of heterogeneous cognitive modalities.

Keywords:

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Visual Navigation for Embodied Agents Using Semantic-Based Multi-Modal Cognitive Graph

Abstract

Metrics

Topics

Related Documents

Knowledge-Based Visual Question Answering Using Multi-Modal Semantic Graph

Multi-Agent Embodied Visual Semantic Navigation With Scene Prior Knowledge

Multi-modal embodied agents scripting

Multi-modal scene graph inspired policy for visual navigation

Goal-Oriented Visual Semantic Navigation Using Semantic Knowledge Graph and Transformer