CostFormer:Cost Transformer for Cost Aggregation in Multi-view Stereo

Weitao Chen; Hongbin Xu; Zhipeng Zhou; Yang Liu; Baigui Sun; Wenxiong Kang; Xuansong Xie

doi:10.24963/ijcai.2023/67

ScienceGate Book Chapters

JOURNAL ARTICLE

CostFormer:Cost Transformer for Cost Aggregation in Multi-view Stereo

Weitao Chen Hongbin Xu Zhipeng Zhou Yang Liu Baigui Sun Wenxiong Kang Xuansong Xie

Year: 2023 Pages: 599-608

DOI: 10.24963/ijcai.2023/67

Get Full-Text PDF Get Analytical Report

Abstract

The core of Multi-view Stereo(MVS) is the matching process among reference and source pixels. Cost aggregation plays a significant role in this process, while previous methods focus on handling it via CNNs. This may inherit the natural limitation of CNNs that fail to discriminate repetitive or incorrect matches due to limited local receptive fields. To handle the issue, we aim to involve Transformer into cost aggregation. However, another problem may occur due to the quadratically growing computational complexity caused by Transformer, resulting in memory overflow and inference latency. In this paper, we overcome these limits with an efficient Transformer-based cost aggregation network, namely CostFormer. The Residual Depth-Aware Cost Transformer(RDACT) is proposed to aggregate long-range features on cost volume via self-attention mechanisms along the depth and spatial dimensions. Furthermore, Residual Regression Transformer(RRT) is proposed to enhance spatial attention. The proposed method is a universal plug-in to improve learning-based MVS methods.

Keywords:

Computer science Transformer Residual Inference Artificial intelligence Algorithm Engineering

Metrics

Cited By

4.37

FWCI (Field Weighted Citation Impact)

Refs

0.94

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Vision and Imaging

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Optical measurement and interference techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Processing Techniques and Applications

Physical Sciences → Engineering → Media Technology

CostFormer:Cost Transformer for Cost Aggregation in Multi-view Stereo

Abstract

Metrics

Citation History

Topics

Related Documents

Multi-View Image Feature Correlation Guided Cost Aggregation For Multi-View Stereo

GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo

Frequency-enhanced representation and cost aggregation for multi-view stereo

Cascade Cost Volume Multi-View Stereo Network with Transformer and Pseudo 3D

Efficient Multi-View Stereo for Space Target with Mamba-Based Cost Aggregation