Zero-Shot Learning (ZSL) aims to recognize images from seen and unseen classes by aligning visual and semantic knowledge (e.g., attribute descriptions). However, the fine-grained attributes in the RGB domain can be easily affected by background noise (e.g., the grey bird tail blending with the ground), making it difficult to effectively distinguish them. Analyzing the features in the frequency domain assists in better distinguishing the attributes since their patterns remain consistent across different images, unlike noise which may be more variable. Nevertheless, existing ZSL methods typically learn visual features directly from the RGB domain, which can impede the recognition of certain attributes. To overcome this limitation, we propose a novel ZSL method named Frequency-based Phase Augmentation (FPA) network, which learns an effective representation of the attributes in the frequency domain. Specifically, we introduce a Hybrid Phase Augmentation (HPA) module to transform visual features into the frequency domain and augment the phase component for better retention of semantic information of the attributes. The use of phase-augmented features enables FPA to capture more semantic knowledge that can be challenging to distinguish in the RGB domain, suppress noise, and highlight significant attributes. Our extensive experiments show that FPA achieves state-of-the-art performance across four standard datasets.
Wanting YinJiannan GeLei ZhangPandeng LiYizhi LiuHongtao Xie
Bin TongMartin KlinkigtJunwen ChenXiankun CuiQuan KongTomokazu MURAKAMIYoshiyuki Kobayashi
Rafael FelixMichele SasdelliIan ReidGustavo Carneiro
Yi SuYunpeng TaiYixin JiJuntao LiBowen YanMin Zhang