JOURNAL ARTICLE

PT-MVSNet: Overlapping Attention Multi-view Stereo Network with Transformers

Abstract

In this paper, we propose a new multi-view stereo vision model PT -MVSNet based on multi-view stereo (MVS). Multi-view stereo is a successful reconstruction method that uses multiple images to reconstruct a 3D scene. It has been applied in many practical scenes such as architecture, cultural heritage protection, and map making. MVS still faces a lot of challenges, including inaccurate feature matching, excessive image noise, and overly complex computation. To solve the feature-matching inaccuracy problem, we take the Transformer model as the main structure in the feature-matching and add a patch-based overlap attention module (POLA). In this paper, we proposed PT-MVSNet can solve the image feature extraction problem more effectively. To validate the effectiveness of the model, we conducted experiments on the DTU dataset and evaluated its performance by two evaluation metrics. The experiment results show that our method outperforms the latest methods, whose accuracy and completeness reach 0.386 and 0.271 respectively.

Keywords:
Computer science Artificial intelligence Transformer Computer vision Feature extraction Computation Feature (linguistics) Matching (statistics) Feature matching Pattern recognition (psychology) Algorithm Mathematics Engineering

Metrics

1
Cited By
0.18
FWCI (Field Weighted Citation Impact)
15
Refs
0.40
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Vision and Imaging
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
3D Surveying and Cultural Heritage
Physical Sciences →  Earth and Planetary Sciences →  Geology
© 2026 ScienceGate Book Chapters — All rights reserved.