In this paper, we propose a learning-based multi-view stereo network with the proposed feature correlation aggregation network (FCANet). We notice that the source views used to infer the depth of reference view are quite different, which are reflected in the images. Therefore, the contribution of source views should be different for building cost volume, which depends on the similarity between the source and reference views in our opinion. To this end, we propose FCANet infer the similarity to guide the cost aggregation. In addition, we adopt the strategy to build cost volume and infer depth in coarse to fine. We evaluate the proposed FCA-MVSNet and conduct ablation studies for the proposed FCANet on DTU dataset. The results show that we can significantly outperform the baseline and achieve state-of-the-art results, especially the reconstruction completeness has broken through 0.3mm of mean distance metric. Moreover, the proposed FCANet can significantly improve the reconstruction quality compared with the widely used variance metric.
Matteo PoggiAndrea ContiStefano Mattoccia
Lina WangJiangfeng SheQiang ZhaoXiang WenYuzheng Guan
Ming HanHui YinAixin ChongQianqian Du
Jiang WuRui LiHaofei XuWenxun ZhaoYu ZhuJinqiu SunZhang Yanning
Weitao ChenHongbin XuZhipeng ZhouYang LiuBaigui SunWenxiong KangXuansong Xie