Multi‐modal object detection via transformer network

Wenbing Liu; Haibo Wang; Quanxue Gao; Zhaorui Zhu

doi:10.1049/ipr2.12884

ScienceGate Book Chapters

JOURNAL ARTICLE

Multi‐modal object detection via transformer network

Wenbing Liu Haibo Wang Quanxue Gao Zhaorui Zhu

Year: 2023 Journal: IET Image Processing Vol: 17 (12)Pages: 3541-3550 Publisher: Institution of Engineering and Technology

DOI: 10.1049/ipr2.12884

Get Full-Text PDF Get Analytical Report

Abstract

Abstract According to the fact that single‐modal data usually contain limited information, a great deal of effort has been devoted to making use of the complementary information contained in the multi‐modal data on various patterns. Thus, this paper is concerned with an object detection method that can fully utilize multi‐modal data. First, the method introduces the transformer mechanism to realize the fusion of intra‐modal and inter‐modal features of different modal data. The aim is to take advantage of the complementarity of data between modalities, which helps to improve the performance of multi‐modal object detection. Second, a contrastive loss suitable for contrastive learning is applied. This enables the authors to effectively utilize label information. Extensive experiments are conducted on multiple object detection datasets to demonstrate the effectiveness of our proposed method.

Keywords:

Modal Computer science Transformer Artificial intelligence Information loss Data mining Object detection Complementarity (molecular biology) Object (grammar) Pattern recognition (psychology) Machine learning Voltage Engineering

Metrics

Cited By

0.73

FWCI (Field Weighted Citation Impact)

Refs

0.66

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Visual Attention and Saliency Detection

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Multi‐modal object detection via transformer network

Abstract

Metrics

Citation History

Topics

Related Documents

Multi-scale Cross-Modal Transformer Network for RGB-D Object Detection

Class-Agnostic Object Detection with Multi-modal Transformer

Class-agnostic Object Detection with Multi-modal Transformer

Lightweight Transformer for Multi-Modal Object Detection (Student Abstract)

Multi-Modal Transformer for RGB-D Salient Object Detection