Toon Van de MaeleTim VerbelenPietro MazzagliaStefano FerraroBart Dhoedt
Abstract Representing a scene and its constituent objects from raw sensory data is a core ability for enabling robots to interact with their environment. In this letter, we propose a novel approach for scene understanding, leveraging an object-centric generative model that enables an agent to infer object category and pose in an allocentric reference frame using active inference, a neuro-inspired framework for action and perception. For evaluating the behavior of an active vision agent, we also propose a new benchmark where, given a target viewpoint of a particular object, the agent needs to find the best matching viewpoint given a workspace with randomly positioned objects in 3D. We demonstrate that our active inference agent is able to balance epistemic foraging and goal-driven behavior, and quantitatively outperforms both supervised and reinforcement learning baselines by more than a factor of two in terms of success rate.
Toon Van de MaeleTim VerbelenOzan ÇatalBart Dhoedt
Toon Van de MaeleTim VerbelenOzan ÇatalBart Dhoedt
Tonglin ChenZhimeng ShenBin LiXiangyang Xue
Yang LiuZhengliang GuoJing LiuChengfang LiLiang Song
Tonglin ChenYinxuan HuangJinghao HuangBin LiXiangyang Xue