Wenbing LiuHaibo WangQuanxue GaoZhaorui Zhu
Abstract According to the fact that single‐modal data usually contain limited information, a great deal of effort has been devoted to making use of the complementary information contained in the multi‐modal data on various patterns. Thus, this paper is concerned with an object detection method that can fully utilize multi‐modal data. First, the method introduces the transformer mechanism to realize the fusion of intra‐modal and inter‐modal features of different modal data. The aim is to take advantage of the complementarity of data between modalities, which helps to improve the performance of multi‐modal object detection. Second, a contrastive loss suitable for contrastive learning is applied. This enables the authors to effectively utilize label information. Extensive experiments are conducted on multiple object detection datasets to demonstrate the effectiveness of our proposed method.
Zhibin XiaoPengwei XieGuijin Wang
Muhammad MaazHanoona RasheedSalman KhanFahad Shahbaz KhanRao Muhammad AnwerMing–Hsuan Yang
Yue CaoYanshuo FanJunchi BinZheng Liu
Peipei SongJing ZhangPiotr KoniuszNick Barnes