Drone-based object detection faces critical challenges, including tiny objects, complex urban backgrounds, dramatic scale variations, and high-frequency detail loss during feature propagation. Current detection methods struggle to address these challenges while maintaining computational efficiency effectively. We propose Scale-Frequency Detection Transformer (SF-DETR), a novel end-to-end framework for drone-view scenarios. SF-DETR introduces a lightweight ScaleFormerNet backbone with Dual Scale Vision Transformer modules, a Bilateral Interactive Feature Enhancement Module, and a Multi-Scale Frequency-Fused Feature Enhancement Network. Extensive experiments on the VisDrone2019 dataset demonstrate SF-DETR’s superior performance, achieving 51.0% mAP50 and 31.8% mAP50:95, surpassing state-of-the-art methods like YOLOv9m and RTDETR-r18 by 6.2% and 4.0%, respectively. Further validation of the HIT-UAV dataset confirms the model’s generalization capability. Our work establishes a new benchmark for drone-view object detection and provides lightweight architecture suitable for embedded device deployment in real-world aerial surveillance applications.
Weixi WangYe LiuZongyong CuiLiang ShanBo‐Chao ZhengZhong WeiXiaoqing ZhangSatoshi Yamane
Bei LiuJiangliang JinYihong ZhangChen Sun
Yi MaoHaowei ZhangRui LiFeng ZhuRui SunPing Ji
Linhui DaiHong LiuHao TangZhiwei WuPinhao Song
Fanglin LiuQinghe ZhengXinyu TianFeng ShuWeiwei JiangMiaohui WangAbdussalam ElhanashiSergio Saponara