In recent years, with rapid development of artificial intelligence technology, object detection in remote sensing images has attracted wide attention in the field of computer vision. Different from ordinary image object detection, low proportion of object semantic information and unbalanced number of object categories. To solve these problems, an object detection method based on contrastive learning representation is proposed to improve the performance of remote sensing image object detection. The design of backbone network is based on the image-text pair contrast language-image pretrained model, which can effectively increase the number of target detection types in remote sensing images by using its representation learning in zero-shot and transfer learning Settings. The RCNN structure is added by extension design to achieve a region-level visual representation of model learning, enabling fine-grained alignment between image regions and text concepts. The problem of missing detection is solved, which is caused by many kinds of objects, low proportion of object semantic information and unbalanced number of object categories. The experimental results show that the target detection method based on image-text representation learning can effectively detect multiple targets with low information proportion in remote sensing images. Compared with the original RCNN, this method can detect more types of objects in remote sensing images, and the number of objects detected is also improved compared with other typical remote sensing object detection methods, which proves the effectiveness of the remote sensing image object detection method.
Weidong YanChaosheng ZhuMengtian WangD. X. YuZhen ZouTianyi Xia
Zhan Cong TanXiao Feng DuMan WangXiao Zhu XieGui Song WangQin Nie
Xiaodong MuKun BAIXuan-ang YOUYongqing ZhuXue-bing CHEN解放军61068部队,陕西 西安 710100 Unit 61068, Xi’an 710100, China
Haonan ZhouHui TangXiangchun LiuXiaoxiao ShiLurui Xia
Zhibao WangXiaoqing HeBin XiaoLiangfu ChenXiuli Bi