For object-centric representation learning, several slot-based methods, that separate objects using masks and learn the objects separately, are proposed. While these methods are proved to be useful on various downstream tasks, it is known that they require a significant amount of computation for training. We propose the introduction of attention mechanisms into slot-based method to simplify and speed up the computation. We pick ViMON as the base structure and propose two methods, named AttnViMON and SFA. We evaluate them in terms of reconstruction error and computation time, and a downstream task. The proposed methods demonstrate that they achieve significant speed-up while showing even better performance.
Ruocheng WangJiayuan MaoSamuel J. GershmanJiajun Wu
Henri PlacekChris ChildTillman Weyde
Qu TangHaochen WangXiangyu ZhuZhen LeiZhaoxiang Zhang
Donghun LeeSamyeul NohIngook JangSeonghyun KimSoon-Young SongHeechul Bae