Neural object-centric scene representation and generation

Singh, Gautam

doi:10.7282/t3-yhrr-wp58

ScienceGate Book Chapters

JOURNAL ARTICLE

Neural object-centric scene representation and generation

Singh, Gautam

Year: 2025 Journal: Rutgers University Community Repository (Rutgers University) Publisher: Rutgers, The State University of New Jersey

DOI: 10.7282/t3-yhrr-wp58

Get Full-Text PDF Get Analytical Report

Abstract

Although deep learning has achieved remarkable success, it still falls short in robustness, systematic generalization, interpretability, reasoning, and creating new knowledge from limited experience. Addressing these limitations requires learning representations that understand the underlying causal structure of the data. A key step in this direction is discovering hidden generative causal variables, such as objects and other scene factors. This dissertation develops architectures and algorithms to infer object-centric representations of visual scenes without human supervision or labels. Building on the idea of perception as inverse graphics, existing approaches rely on inverting renderers that are brittle, cumbersome, and limited to simple visual scenes. In Part One, we propose, for the first time, the idea of taking an expressive decoder and inverting it to learn object-centric representations. We show that this achieves an unprecedented scene decomposition ability in visually complex scenes. It gracefully handles aspects of raytracing like shadows and reflections that are poorly handled by existing decoders. We also show evidence of systematic generalization by decoding novel object combinations. Next, to extend these benefits from images to videos, we explore two routes: a recurrent route and a parallelizable route; and analyze their trade-offs. In Part Two, we build on our previous success and move beyond monolithic object representations. We introduce a novel method that discovers not only objects but also intra-object factors, crucially, for the first time in complex scenes.

Keywords:

Generalization Representation (politics) Object (grammar) Perspective (graphical) Key (lock) Simple (philosophy) Generative grammar Perception Deep learning

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.50

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Generative Adversarial Networks and Image Synthesis

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Face recognition and analysis

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Neural object-centric scene representation and generation

Abstract

Metrics

Topics

Related Documents

Object-Centric Scene Representation Learning via SAM

Object-Centric Representation Learning for Video Scene Understanding

ASIMO: Agent-centric scene representation in multi-object manipulation

Neural representation of object-scene scale consistency

Deep convolution neural network with scene-centric and object-centric information for object detection