Qi BiJingjun YiHaolan ZhanWei JiBo Du
Fine-grained image analysis is widely recognized as highly challenging, since distinguishing individual differences within a certain category, species, or type often depends on tiny, subtle patterns. However, learning fine-grained semantic categories from these subtle part patterns is inherently fragile, as they can easily be overwhelmed by the dominant patterns resting in the coarse-category information. Therefore, how to enhance the relation between the fine-grained semantics and these subtle patterns is the key. To push this frontier, a novel semantic-part alignment (SPA) learning scheme is proposed in this paper. Its general idea is to firstly measure the relevance of each part to the fine-grained semantics, and then regularize the fine-grained visual representation learning. Specifically, it consists of three key components, namely, joint semantic-part modeling, semantic-part set modeling, and optimal semantic-part transport. The joint semantic-part modeling associates each part in an image with the fine-grained semantics in a latent space. Then, the optimal semantic-part transport component is devised to enhance the relation between fine-grained semantic embeddings and the discriminative part embeddings. Notably, the proposed SPA is plug-in-and-play, easy-to-implement, and insensitive to the latent embedding dimension and loss weight. Experiments show the proposed method can substantially boost performance on multiple fine-grained image analysis tasks across various baselines.
Shijie WangZhihui WangHaojie LiJianlong ChangWanli OuyangQi Tian
Da-Cheng JuanChun-Ta LuZhen LiFutang PengAleksei TimofeevYi-Ting ChenYaxi GaoTom DuerigAndrew TomkinsSujith Ravi
Qian ChenLi LiuXiaodong FuLijun LiuQingsong Huang云南省计算机技术应用重点实验室, 昆明 650500 Computer Technology Application Key Lab of Yunnan Province, Kunming 650500, China
Jiashui WangPeng QianXilin HuangXinlei YingYan ChenShouling JiJianhai ChenJundong XieLong Liu