JOURNAL ARTICLE

Object-Centric Scene Representations Using Active Inference

Abstract

Abstract Representing a scene and its constituent objects from raw sensory data is a core ability for enabling robots to interact with their environment. In this letter, we propose a novel approach for scene understanding, leveraging an object-centric generative model that enables an agent to infer object category and pose in an allocentric reference frame using active inference, a neuro-inspired framework for action and perception. For evaluating the behavior of an active vision agent, we also propose a new benchmark where, given a target viewpoint of a particular object, the agent needs to find the best matching viewpoint given a workspace with randomly positioned objects in 3D. We demonstrate that our active inference agent is able to balance epistemic foraging and goal-driven behavior, and quantitatively outperforms both supervised and reinforcement learning baselines by more than a factor of two in terms of success rate.

Keywords:
Inference Computer science Artificial intelligence Object (grammar) Benchmark (surveying) Workspace Generative model Active perception Representation (politics) Active vision Matching (statistics) Generative grammar Perception Machine learning Computer vision Robot Mathematics

Metrics

5
Cited By
1.59
FWCI (Field Weighted Citation Impact)
70
Refs
0.72
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Reinforcement Learning in Robotics
Physical Sciences →  Computer Science →  Artificial Intelligence
Robot Manipulation and Learning
Physical Sciences →  Engineering →  Control and Systems Engineering

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.