Abhinaba RoyBiplab BanerjeeVittorio Murino
In this paper We deal with the problem of zero-shot visual recognition. The standard zero-shot learning (ZSL) pipeline is based on the idea of learning a functional mapping from a visual embedding space to an auxiliary semantic space for a set of seen categories. In the testing phase, the task is to recognize a set of novel categories which are semantically linked to the already known ones. Although such a pipeline is inherently supervised, there exists very few endeavours in the context of ZSL that enforce discrimination in learning this mapping. In this work, we propose a novel encoder-decoder network to explore the possibility of learning an intermediate latent space for the visual features, which is deemed to be simultaneously reconstructive and discriminative. By reaching a trade-off between the joint (re)construction of the visual and the semantic embedding spaces, while ensuring separability among the known classes, the proposed model better generalizes to the unknown categories. Experimental results obtained on challenging datasets, such as AwA, CUB, and ImageNet-2, establish the efficacy of such a discriminative latent space for the standard ZSL setup.
Sanath NarayanAkshita GuptaFahad Shahbaz KhanCees G. M. SnoekLing Shao
Christoph H. LampertHannes NickischStefan Harmeling
Run-Qing ChenSongsong WuGuangcheng Sun
Yongqin XianZeynep AkataGaurav SharmaQuynh L. NguyenMatthias HeinBernt Schiele