In this paper, we propose a new multi-view stereo vision model PT -MVSNet based on multi-view stereo (MVS). Multi-view stereo is a successful reconstruction method that uses multiple images to reconstruct a 3D scene. It has been applied in many practical scenes such as architecture, cultural heritage protection, and map making. MVS still faces a lot of challenges, including inaccurate feature matching, excessive image noise, and overly complex computation. To solve the feature-matching inaccuracy problem, we take the Transformer model as the main structure in the feature-matching and add a patch-based overlap attention module (POLA). In this paper, we proposed PT-MVSNet can solve the image feature extraction problem more effectively. To validate the effectiveness of the model, we conducted experiments on the DTU dataset and evaluated its performance by two evaluation metrics. The experiment results show that our method outperforms the latest methods, whose accuracy and completeness reach 0.386 and 0.271 respectively.
Liang WangLicheng SunFuqing Duan
Yu LiangDongxu DuanYuhong YuanKai Zhang
Haoran KongFanzi ZengLongbao DaiJingyang HuJianghao CaiJianxia ChenRuihui LiHongbo Jiang
Peiyao LiGongjian WenShilin Zhou
Lu LuHongbo HuangXiaoxu YanYizhuo LiuZixia ZhangHanjun ChenShichao ZhouZixuan Rui