JOURNAL ARTICLE

Visual Navigation for Embodied Agents Using Semantic-Based Multi-Modal Cognitive Graph

Qiming LiuXinmin DuZhe LiuHesheng Wang

Year: 2025 Journal:   IEEE Transactions on Image Processing Vol: 34 Pages: 7989-8001   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Visual navigation is fundamental for embodied agents operating in expansive workspaces. The cognitive abilities of these agents form the essential basis for creating intelligent behavioral patterns. Memory and reasoning are vital components among these abilities. The former enhances decision-making by preserving a wide array of episodic spatio-temporal perception cues, while the latter allows proactive and advanced probabilistic inference of task distributions based on long-term experiences. Despite individual studies on these two cognitive modalities, their integration for enhanced decision-making presents a considerable challenge due to their substantial differences in representation and behavioral characteristics. In this paper, we introduce Semantic-based Multi-modal Cognitive Graph (SMCG) for intelligent visual navigation. This framework is distinguished by its unified semantic-level representation of both memory and reasoning capabilities. Specifically, SMCG, rather than directly memorizing perceptual features as per previous methods, records observed object sequences. Simultaneously, reasoning is based on a semantic relation graph that represents correlations among objects. We additionally develop a hierarchical cognition extraction (HCE) pipeline and employ it to decode cognitive cues within SMCG and situation-aware subgraphs, thereby enhancing intelligent navigation behavior. Experimental results in image-goal navigation show pronounced performance improvements, credited to the effective induction and rational application of heterogeneous cognitive modalities.

Keywords:

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
34
Refs
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Related Documents

JOURNAL ARTICLE

Knowledge-Based Visual Question Answering Using Multi-Modal Semantic Graph

Jiang LeiZuqiang Meng

Journal:   Electronics Year: 2023 Vol: 12 (6)Pages: 1390-1390
JOURNAL ARTICLE

Multi-Agent Embodied Visual Semantic Navigation With Scene Prior Knowledge

Xinzhu LiuDi GuoHuaping LiuFuchun Sun

Journal:   IEEE Robotics and Automation Letters Year: 2022 Vol: 7 (2)Pages: 3154-3161
JOURNAL ARTICLE

Multi-modal embodied agents scripting

Yasmine ArafaAbe Mamdani

Year: 2003 Pages: 454-459
JOURNAL ARTICLE

Multi-modal scene graph inspired policy for visual navigation

Yu HeKang ZhouTao Tian

Journal:   The Journal of Supercomputing Year: 2024 Vol: 81 (1)
JOURNAL ARTICLE

Goal-Oriented Visual Semantic Navigation Using Semantic Knowledge Graph and Transformer

Zhongli WangGuohui Tian

Journal:   IEEE Transactions on Automation Science and Engineering Year: 2024 Vol: 22 Pages: 1647-1657
© 2026 ScienceGate Book Chapters — All rights reserved.