Julie ArtoisPeter LambertGlenn Van Wallendael
Images and videos allow us to explore places and connect to people all around the world, in the present or past. What if we could break through the glass screen in front of us and step into those camera captures. Although challenging, in recent years, light field technology has developed some promising techniques, such as 3D Gaussian Splatting, that is able to render high-quality views of a scene reconstructed using only camera captures. However, creating the scene model takes a significant amount of time and compute power, which makes it unviable for the multimedia industry which outputs terabytes of new content daily. In this paper, we present a method of speeding up the modeling process, not by optimizing the training, but by initializing the pipeline with an already semi-finished reconstruction. This is done by estimating the depth maps of the camera images, fusing them and converting this to a dense set of Gaussian splats which already closely resembles the scene. Afterwards, the default training process is applied to fine-tune and quickly synthesize new high-quality views. We show that our method on average, after 1000 iterations, improves PSNR by +1.27 dB, SSIM by +0.065 and LPIPS by -0.10, compared to the default initialization.
T. LiuGuangcong WangShoukang HuLiao ShenXinyi YeYuhang ZangZhiguo CaoWei LiZiwei Liu
Byeonggwon LeeJun‐Kyu ParkKhang Truong GiangSungho JoSoohwan Song
Thomas SuggMichelle JouMarc Bosch
Fei TengLu BiJieming GaoShuixuan ChenGuowei Zhang
Yongyi YangJie CaoHong ZhaoWeijie Wang