In current deep-learning-based multi-view stereo methods, feature extraction and cost volume regularization are two key steps that affect the reconstruction quality. Most current methods have difficulties both in accurately extract the required features and fully utilize the multi-scale contextual semantic information in the cost volumes. In this work, we propose a MA-CVP-MVSNet based on hybrid attention mechanism for multi-view stereo. The proposed method consists of two core attention mechanisms. One is the Criss-Cross Attention module to capture the global dependencies of the pixels in the feature map. The other is the SK Attention module, which is used for cost volume regularization to aggregate multi-scale contextual semantic information in the cost volumes. Experiments show that our method has a remarkable improvement in accuracy and achieves competitive results.
Song ZhangLin LiJiangxuan Qiao
Peiyao LiGongjian WenShilin Zhou
Xiaoyan ZhangHao ShiChaozheng Wang
Po-Heng ChenHsiao-Chien YangKuan‐Wen ChenYong‐Sheng Chen
Yu LiangDongxu DuanYuhong YuanKai Zhang