JOURNAL ARTICLE

Neural object-centric scene representation and generation

Singh, Gautam

Year: 2025 Journal:   Rutgers University Community Repository (Rutgers University)   Publisher: Rutgers, The State University of New Jersey

Abstract

Although deep learning has achieved remarkable success, it still falls short in robustness, systematic generalization, interpretability, reasoning, and creating new knowledge from limited experience. Addressing these limitations requires learning representations that understand the underlying causal structure of the data. A key step in this direction is discovering hidden generative causal variables, such as objects and other scene factors. This dissertation develops architectures and algorithms to infer object-centric representations of visual scenes without human supervision or labels. Building on the idea of perception as inverse graphics, existing approaches rely on inverting renderers that are brittle, cumbersome, and limited to simple visual scenes. In Part One, we propose, for the first time, the idea of taking an expressive decoder and inverting it to learn object-centric representations. We show that this achieves an unprecedented scene decomposition ability in visually complex scenes. It gracefully handles aspects of raytracing like shadows and reflections that are poorly handled by existing decoders. We also show evidence of systematic generalization by decoding novel object combinations. Next, to extend these benefits from images to videos, we explore two routes: a recurrent route and a parallelizable route; and analyze their trade-offs. In Part Two, we build on our previous success and move beyond monolithic object representations. We introduce a novel method that discovers not only objects but also intra-object factors, crucially, for the first time in complex scenes.

Keywords:
Generalization Representation (politics) Object (grammar) Perspective (graphical) Key (lock) Simple (philosophy) Generative grammar Perception Deep learning

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.50
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Generative Adversarial Networks and Image Synthesis
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Face recognition and analysis
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Object-Centric Representation Learning for Video Scene Understanding

Yi ZhouHui ZhangSeung-In ParkByungIn YooXiaojuan Qi

Journal:   IEEE Transactions on Pattern Analysis and Machine Intelligence Year: 2024 Vol: 46 (12)Pages: 8410-8423
JOURNAL ARTICLE

ASIMO: Agent-centric scene representation in multi-object manipulation

Cheol-Hui MinYoung Min Kim

Journal:   The International Journal of Robotics Research Year: 2024 Vol: 44 (1)Pages: 22-64
JOURNAL ARTICLE

Neural representation of object-scene scale consistency

Lauren E. WelbourneBarry GiesbrechtMiguel P. Eckstein

Journal:   Journal of Vision Year: 2018 Vol: 18 (10)Pages: 1243-1243
© 2026 ScienceGate Book Chapters — All rights reserved.