Yu-Jie YuanYu‐Kun LaiYihua HuangLeif KobbeltLin Gao
The recently proposed neural radiance fields (NeRF) use a continuous function formulated as a multi-layer perceptron (MLP) to model the appearance and geometry of a 3D scene. This enables realistic synthesis of novel views, even for scenes with view dependent appearance. Many follow-up works have since extended NeRFs in different ways. However, a fundamental restriction of the method remains that it requires a large number of images captured from densely placed viewpoints for high-quality synthesis and the quality of the results quickly degrades when the number of captured views is insufficient. To address this problem, we propose a novel NeRF-based framework capable of high-quality view synthesis using only a sparse set of RGB-D images, which can be easily captured using cameras and LiDAR sensors on current consumer devices. First, a geometric proxy of the scene is reconstructed from the captured RGB-D images. Renderings of the reconstructed scene along with precise camera parameters can then be used to pre-train a network. Finally, the network is fine-tuned with a small number of real captured images. We further introduce a patch discriminator to supervise the network under novel views during fine-tuning, as well as a 3D color prior to improve synthesis quality. We demonstrate that our method can generate arbitrary novel views of a 3D scene from as few as 6 RGB-D images. Extensive experiments show the improvements of our method compared with the existing NeRF-based methods, including approaches that also aim to reduce the number of input images.
Qian LiFranck MultonAdnane Boukhayma
Beerend G. A. GeratsJelmer M. WolterinkIvo A. M. J. Broeders
Dogyoon LeeDonghyeong KimJungho LeeMinhyeok LeeSeung‐Hoon LeeSangyoun Lee
Michael NiemeyerJonathan T. BarronBen MildenhallMehdi S. M. SajjadiAndreas GeigerNoha Radwan
Y. XuMinglin ChenLongguang WangYe ZhangYulan Guo