Yaxiong ChenJinghao HuangShengwu XiongXiaoqiang Lu
For cross-modal remote sensing image-audio retrieval task, hashing technology has attracted much attention in recent works. Most of them focus on mapping Remote Sensing (RS) images and audios into a Hamming space, whilst neglecting discriminative information of RS images and fine alignment for RS images and audios. In this paper, we tackle these dilemmas with a novel Fine Aligned Discriminative Hashing (FADH) approach, which can learn hash codes to capture discriminative information of RS images and learn the corresponding detailed information between RS images and audios simultaneously. We first develop a new discriminative information learning module to learn discriminative information of RS images. Meanwhile, a fine alignment module is proposed to unearth the fine correspondence for RS image regions and audios, which can effectively improve the retrieval performance. On top of the two paths, we design a new objective function, which can maintain the similarity of hash codes, preserve the semantic information of RS image features and audio features and eliminate cross-modal differences. The reliability and significance of the designed framework are effectively demonstrated by diverse experiments on three remote sensing image-audio datasets.
Jiajun ZhuXingbo LiuXuening ZhangXiushan Nie
Wenxi LangHan SunCan XuNingzhong LiuHuiyu Zhou
Xu TangYuqun YangJingjing MaYiu‐ming CheungChao LiuFang LiuXiangrong ZhangLicheng Jiao
Xiaojie LiuXiliang ChenGuobin Zhu
Peng LiXiaoyu ZhangXiaobin ZhuPeng Ren