Xiang YuYing QianGuodong JinZhe GengDaiyin Zhu
Multi-view Synthetic Aperture Radar (SAR) provides rich information for target recognition. However, fusing features from unaligned multi-view images presents challenges for existing methods. Conventional early fusion methods often rely on image registration, a process that is computationally intensive and can introduce feature distortions. More recent registration-free approaches based on the Transformer architecture are constrained by standard position encodings, which were not designed to represent the rotational relationships among multi-view SAR data and thus can cause spatial ambiguity. To address this specific limitation of position encodings, we propose a registration-free fusion framework based on a spatially aware Transformer. The framework includes two key components: (1) a multi-view polar coordinate position encoding that models the geometric relationships of patches both within and across views in a unified coordinate system; and (2) a spatially aware self-attention mechanism that injects this geometric information as a learnable inductive bias. Experiments were conducted on our self-developed FAST-Vehicle dataset, which provides full 360° azimuthal coverage. The results show that our method outperforms both registration-based strategies and Transformer baselines that use conventional position encodings. This work indicates that for multi-view SAR fusion, explicitly modeling the underlying geometric relationships with a suitable position encoding is an effective alternative to physical image registration or the use of generic, single-image position encodings.
Yanqin JiangLi ZhangZhenwei MiaoXiatian ZhuJin GaoWeiming HuYu–Gang Jiang
Yujia DiaoShuowei LiuXunzhang GaoAfei Liu
Shasha TianZiyi LiBao‐Liang LuWei‐Long Zheng
Chunhui QuZhanyang AiKeyu SuBo Chen