In recent years, object detection in the context of UAV images has seen tremendous progress. However, small object detection still presents a great challenge. The poor performance of detectors on small objects can be attributed to multiple reasons, on one hand, deep networks' abilities are inherently limited with regard to small objects due to having multiple layers of abstractions, on the other hand, most training datasets suffer from class and size imbalance which hinders the model's ability to generalize over a wide range of scales. To overcome these limitations, and improve the accuracy of detection for small objects, we introduce an improved feature pyramid structure based on attentive feature fusion factor, lightweight joint attention to guide multi-scale feature fusion across multiple FPN layers to avoid small object information loss. We also propose a novel copy-paste data augmentation scheme to mitigate the size imbalance issue across datasets allowing better small objects contribution to the overall training loss. We evaluate our model on both MS COCO and VisDrone datasets, experiment results on MS COCO show an improvement of 1.4% and 2.7% on $\text{mAP}_{small}$ and AP respectively compared to the baseline. On VisDrone we achieve competitive results compared to SOTA detectors.
Xu CaoXizheng ZhangYingjun HouZhangyu LuRuoyuan LiuQin Wei
Fauzan MasykurAngga PrasetyoIsmail Abdurrozaq ZulkarnainEllisia KumalasariPradityo Utomo