DIF-DETR: Dynamic Interactive Fusion Transformer with Adaptive Feature Enhancement for Efficient Aerial Small Object Detection

Jing Wang; Hejiang Li; Caihong Huangfu

doi:10.54097/b3psbw85

ScienceGate Book Chapters

JOURNAL ARTICLE

DIF-DETR: Dynamic Interactive Fusion Transformer with Adaptive Feature Enhancement for Efficient Aerial Small Object Detection

Jing Wang Hejiang Li Caihong Huangfu

Year: 2025 Journal: Journal of Computer Science and Artificial Intelligence Vol: 5 (3)Pages: 9-19

DOI: 10.54097/b3psbw85

Get Full-Text PDF Get Analytical Report

Abstract

In recent years, object detection models based on Transformers have demonstrated outstanding performance in general scenarios due to their powerful global feature modeling capabilities. However, when directly applied to aerial image detection tasks, their performance often falls short of expectations. The root cause lies in the nature of aerial imagery, which typically contains numerous small objects. These objects occupy an extremely low proportion of pixels, resulting in weak feature representation. They are also susceptible to factors such as complex background noise and mutual interference from densely distributed targets, making it difficult for Transformer models to effectively capture and distinguish small object features. To address these challenges, this paper proposes an enhanced Transformer architecture for aerial small object detection: Dynamic Interactive Fusion DETR (DIF-DETR). Its core innovations comprise two aspects: First, introducing the DIENet backbone feature extraction network embedded with DIEBlocks. These DIEBlocks serve as feature enhancement units within the backbone network, leveraging dynamic Inception multi-branch deep convolutions and adaptive weight allocation mechanisms to efficiently capture multi-scale, long-range contextual information. Second, it introduces Context-Aware Bidirectional Fusion (CABF), which enables adaptive complementary fusion of high-level semantic features and low-level detail features within the FPN-PAN architecture of the neck network, effectively mitigating the issue of small target features being obscured by background interference. Experimental results demonstrate that on the highly challenging VisDrone and HIT-UAV aerial datasets, the proposed DIF-DETR network outperforms existing mainstream models with 30.5% mAP and 82.3% mAPtest, respectively. Simultaneously, it significantly reduces computational cost to 43.6 GFLOPs with only 13.4M parameters, achieving an optimal balance between detection accuracy and computational efficiency. This demonstrates that through the synergistic effects of three core innovations, DIF-DETR significantly enhances detection accuracy and robustness for small objects in aerial images, providing an effective solution for object detection tasks in aerial scenarios.

Keywords:

Object detection Aerial image Feature extraction Transformer Feature (linguistics) Sensor fusion Pattern recognition (psychology) Inference

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.77

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Infrared Target Detection Methodologies

Physical Sciences → Engineering → Aerospace Engineering

Advanced Image Fusion Techniques

Physical Sciences → Engineering → Media Technology

DIF-DETR: Dynamic Interactive Fusion Transformer with Adaptive Feature Enhancement for Efficient Aerial Small Object Detection

Abstract

Metrics

Topics

Related Documents

Small Object Detection by DETR via Information Augmentation and Adaptive Feature Fusion

EABI-DETR: An Efficient Aerial Small Object Detection Network

DD-DETR: Deformable DETR with Dynamic Scale Attention for Infrared Aerial Small Object Detection

DAF-DETR: A dynamic adaptation feature transformer for enhanced object detection in unmanned aerial vehicles

DFS-DETR: Detailed-Feature-Sensitive Detector for Small Object Detection in Aerial Images Using Transformer