Object detection in aerial images has received increasing attention for its widely applications. However, it is still a challenging task when dealing with difficult cases, such as the large variations of object sizes, the complex backgrounds in the large-scale views, as well as small targets packed in dense. In this regard, we propose a self-adaptive object detection network for aerial images based on feature enhancement, including region crop module with soft-attention (RCP), feature enhancement module (FET), self adaptive feature extraction module (SAE). Firstly, RCP module with soft-attention is explored to roughly crop the dense subregion and sparse subregion into patches accoridng to the variance of feature maps for the following feature extraction and detection. Secondly, FET module is proposed to acquire more semantic details by feature enhancement, which makes up the information loss during downsampling, especially for small objects in dense regions. Finally, SAE module is explored to effectively identify multiple target regions and single target regions in the dense patches. The similarity of adjacent dense areas in dense patches is calcluated, and the search range is gradually narrowed by continuously merging the areas with the largest similarity. The teacher-network and student-network are used to extract features for multiple target regions and single target regions, respectively. The proposed design improves the accuracy of real-time detection under the drone's perspective. A large number of experiments and comprehensive evaluations on the VisDrone2019-DET dataset have shown the effectiveness and adaptability of the proposed method. Our source codes have been available at https://github.com/zhaokai152.
Shihan MaoZhi WangQineng HeZhangqing Zhu