JOURNAL ARTICLE

Feature‐enhanced representation with transformers for multi‐view stereo

Lintao XiangHujun Yin

Year: 2024 Journal:   IET Image Processing Vol: 18 (6)Pages: 1530-1539   Publisher: Institution of Engineering and Technology

Abstract

Abstract Most existing multi‐view stereo (MVS) methods fail to consider global context information in the stage of feature extraction and cost aggregation. As transformers have shown remarkable performance on various vision tasks due to their ability to perceive global contextual information, this paper proposes a transformer‐based feature enhancement network (TF‐MVSNet) to facilitate feature representation learning by combining local features (both 2D and 3D) with long‐range contextual information. To reduce memory consumption of feature matching, the cross‐attention mechanism is leveraged to efficiently construct 3D cost volumes under the epipolar constraint. Additionally, a colour‐guided network is designed to refine depth maps at a coarse stage, hence reducing incorrect depth predictions at a fine stage. Extensive experiments were performed on the DTU dataset and Tanks and Temples (T&T) benchmark and results are reported.

Keywords:
Computer science Transformer Artificial intelligence Feature extraction Feature (linguistics) Benchmark (surveying) Feature learning Pattern recognition (psychology) Epipolar geometry Representation (politics) Matching (statistics) Benchmarking Computer vision Mathematics Image (mathematics) Engineering Voltage

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
43
Refs
0.02
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Advanced Vision and Imaging
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Enhancement Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Adaptive Feature Enhanced Multi-View Stereo With Epipolar Line Information Aggregation

Shaoqian WangBo LiJian YangYuchao Dai

Journal:   IEEE Robotics and Automation Letters Year: 2024 Vol: 9 (11)Pages: 10439-10446
JOURNAL ARTICLE

CT-MVSNet: Curvature-guided multi-view stereo with transformers

Liang WangLicheng SunFuqing Duan

Journal:   Multimedia Tools and Applications Year: 2024 Vol: 83 (42)Pages: 90465-90486
JOURNAL ARTICLE

Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume

Ming HanHui YinAixin ChongQianqian Du

Journal:   Applied Intelligence Year: 2024 Vol: 54 (17-18)Pages: 7924-7940
© 2026 ScienceGate Book Chapters — All rights reserved.