Yan LiYingdan WuMing YangYong ZhangZhesheng Cheng
Attention mechanisms have achieved remarkable success in the field of computer vision. However, in the context of image matching with large deformations, attention often suffers from divergence issues. To address this, we propose a geometry-aware attention-based matching method called GeoAT. This method leverages low-resolution image features to obtain global geometric constraint information between images and uses an affine transformation matrix to guide the subsequent attention computation on high-resolution features, achieving efficient and accurate matching in a coarse-to-fine manner. One of the key innovations of GeoAT is leveraging the affine transformation matrix derived from the coarse matching stage to guide the cross-attention process in the intermediate matching stage. This enables the neural network to focus primarily on regions with potential correspondences while minimizing attention to irrelevant areas, thereby improving the precision and efficiency of the correspondence identification process. GeoAT employs a flexible window-based attention mechanism, integrating self-attention and cross-attention structures, while fusing multi-scale features and positional information. It demonstrates excellent performance in scenarios with large angle variations and achieves highly accurate matching results even under high matching thresholds. We evaluated GeoAT on two well-known datasets, HPatches and MegaDepth, where it outperformed the current state-of-the-art algorithm, LoFTR, with improvements of 3-5% across various metrics. In the future, GeoAT will focus on further optimizing computational efficiency and enhancing robustness.
Yepeng LiuWei-Yu LaiZhou ZhaoYuxuan XiongJinchi ZhuJun ChengYongchao Xu
Dihe HuangYing ChenYong LiuJianlin LiuShang XuWenlong WuYikang DingFan TangChengjie Wang
Linbo WangBinbin ChenPeng XuHonglong RenXianyong FangShaohua Wan
Rajvi ShahVanshika SrivastavaP. J. Narayanan