Efficient Multi-View Stereo Network with Cross-Scale Transformer

WANG  Sicheng, JIANG  Hao, CHEN  Xiao

ScienceGate Book Chapters

JOURNAL ARTICLE

Efficient Multi-View Stereo Network with Cross-Scale Transformer

WANG Sicheng, JIANG Hao, CHEN Xiao

Year: 2024 Journal: DOAJ (DOAJ: Directory of Open Access Journals)

Get Full-Text PDF Get Analytical Report

Abstract

At present, deep Multi-View Stereo (MVS) methods widely introduce Transformers into cascade networks to achieve high-resolution depth estimation, thereby ensuring highly accurate and complete 3D reconstruction results. However, Transformer-based methods are limited by their computational costs and cannot be extended to more refined stages. To solve this problem, this paper proposes a novel cross-scale Transformer-based MVS network that can manage feature representations at different stages without incurring additional computation. In particular, this study introduces an Adaptive Matching-aware Transformer (AMT), which uses different interactive attention combinations on multiple scales, enabling the proposed network to capture contextual information within images and enhance the feature relationships between images. In addition, this study proposes Dual Feature Guided Aggregation(DFGA) to embed coarse global semantic information into finer cost body construction, further enhancing the perception of global and local features. Simultaneously, a feature metric loss is designed to evaluate feature deviation before and after the Transformation and thereby reduce the impact of feature mismatch on depth estimation. Experimental results show that the integrity and overall measurements of the proposed network are 0.264 and 0.302 on the DTU dataset, respectively. The average reconstruction values for Tank and temples scenarios are 64.28 and 38.03, respectively.

Keywords:

Feature (linguistics) Transformer Cascade Feature extraction Pattern recognition (psychology) Feature model

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.47

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Advanced Vision and Imaging

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Optical measurement and interference techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image Processing Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Efficient Multi-View Stereo Network with Cross-Scale Transformer

Abstract

Metrics

Topics

Related Documents

CT-MVSNet: Efficient Multi-view Stereo with Cross-Scale Transformer

MTD-MVSNet: Multi-view Stereo Network with Multi-scale Transformer and Dual Attention

MVSTER: Epipolar Transformer for Efficient Multi-view Stereo

Transformer-guided Feature Pyramid Network for Multi-View Stereo

LTMVSNet: A Lightweight Transformer Network for Multi-View Stereo