Yu GaoChenwei DengLiang ChenZicong Zhu
With the rapid development of remote sensing imaging and deep learning technology, the fine-grained recognition of rigid targets has gradually emerged. Rigid targets in remote sensing scenes usually retain relatively stable scale information and apparent structure, providing an adequate basis for their discrimination. However, existing methods need to more reasonably utilize the scale information and apparent structure, which results in scale neglect and insufficient discriminative feature extraction (DFE). In response to the aforementioned challenges, we propose our SD-Net, a training framework for fine-grained recognition tasks of rigid objects in remote sensing scenes. It consists of a fused label learning process based on probability distribution function (PDF) and a DFE branch. The PDF counts the objects' scale information by category, builds a probability model based on the sample distribution, and finally converts it into a soft label form to guide model learning. DFE extracts discriminative features along the channel and spatial dimensions of the feature map based on feature deep mining and wide-ranging. Finally, we propose the FAIR1M-OR dataset, containing 37 fine-grained categories and about 600 000 instances, to verify the method's effectiveness. The experimental results show that introducing only a small number of parameters during training, SD-Net, improves the performance of the models based on the ResNet and ViT by about 4.6 points. The code and dataset will be open source in the future.
A-Hyang HanKwang Moo YiKyeong Tae KimJae Young Choi
Tiantian YanHaojie LiBaoli SunZhihui WangZhongxuan Luo
M. B. SrinivasYen‐Yu LinHong-Yuan Mark Liao
Shuyu MiaoShuai Cheng LiLin ZhengWei YuJingjing LiuMingming GongRui Feng