L. ZhaoJianlong WangYunhao ChenQian YinGuyao RongSi-Da ZhouJianing Tang
In this study, an Infrared Small-Target Detection Transformer (IST-DETR) is introduced, a novel model specifically designed to tackle the challenges of low-resolution infrared images and small target scales. IST-DETR integrates a backbone network, a hybrid encoder featuring, a Mutual Feature Screening (MFS) module, and a decoder with auxiliary prediction heads. The hybrid encoder employs Learning Position Encoding to reduce information redundancy and employs a mutual feature screening mechanism to enhance the interaction between high-level semantic features and low-level positional features, facilitating more accurate detection of small infrared targets. What’s more, a customized IoU metric and a novel sample weighting function are employed to effectively address dataset imbalance, significantly improving detection performance. Experiments conducted on the FLIR Dataset, HIT-UAV Dataset, and IVFlying Dataset yielded an average precision (AP) of 44.1%, 34.0%, and 58.5%, respectively, with a processing speed of 74 frames per second. IST-DETR outperforms contemporary algorithms such as Yolov8, CO-DETR, and DINO, demonstrating a superior balance of speed and accuracy, particularly in recognizing small infrared targets across diverse and complex scenarios.
Ben TengHankiz YilahunShuxian LiuAskar Hamdulla
Xiaolong WeiLing YinLiangliang ZhangFei Wu
Mingke ZhangXiamin GuoJunjie HouWei Zhang
Zhan SunChaofeng LiFenglin Man
Huanyu YangJun WangYuming BoJiacun Wang