Bo ZhaoXingzhuo WeiXianyun WuYuqian Cheng
Precisely identifying objects of interest in remote sensing imagery requires not only accurate classification but also precise delineation of their location, scale, and orientation. However, object detection in such imagery faces unique challenges—including complex and cluttered backgrounds, large variations in object scales, and a high prevalence of small targets—factors that distinguish remote sensing data from natural images. Most conventional CNN-based detection frameworks rely heavily on hand-crafted components such as anchor boxes and Non-Maximum Suppression (NMS), which introduce complexity and hinder end-to-end optimization. To address these limitations, this study adopts DETR, a Transformer-based end-to-end detection architecture, as the foundational framework. By leveraging set-based prediction, DETR directly outputs bounding boxes without post-processing heuristics, enabling more streamlined and globally coherent detection. We further adapt this framework with domain-specific enhancements tailored to the characteristics of remote sensing imagery. The proposed DETR-based rotated object detector represents a significant advancement in applying Transformer architectures to remote sensing tasks. Experimental results demonstrate that the refined model achieves a mean Average Precision (mAP) of 77.50% on the DOTA dataset, reflecting a notable improvement in detection accuracy and highlighting the strong potential of Transformer-based models in this domain.
Yuepeng ChenB. L. LiuLuying Yuan
Feng CaoRuoyu WangDeyu LiZhiguo Hu