Object Detection via Multi-Scale Token Based on Vision Transformer

Xiao Yu; Tao Qiu; Xinqi Jiang; Qi Yang; Zhaowei Shang; Taiping Zhang

doi:10.1109/smc53992.2023.10393952

ScienceGate Book Chapters

JOURNAL ARTICLE

Object Detection via Multi-Scale Token Based on Vision Transformer

Xiao Yu Tao Qiu Xinqi Jiang Qi Yang Zhaowei Shang Taiping Zhang

Year: 2023 Pages: 427-432

DOI: 10.1109/smc53992.2023.10393952

Get Full-Text PDF Get Analytical Report

Abstract

Visual transformers have achieved impressive performance on object detection. Traditional transformers only focus on multi-scale features between tokens and tokens. However, these methods do not pay attention to the fine-grained features inside a single token, which can lead to the loss of semantic information in the object detection task. To address this issue, we propose a novel network for the above problem, which consists of three components, (1) Internal Multiscale Token Module (IMTM) focuses on the receptive field size of each token and transforms the token dimension size to effectively extract more multiscale features within the self-attention layer, thereby improving the performance and generalization ability of the model. (2) Differential Filter Module (DFM) uses a convolutional network to focus on high-frequency information in the image, helping the Transformer to learn edge features and establish local context, while improving the model performance through residual connections. (3) Feature Fusion Module (FFM) enhances the local and global information extracted by the network by fusing information from different dimensions. Extensive experiments on PASCAL VOC shows that our proposed method can achieve a state-of-the-art performance on object detection.

Keywords:

Computer science Security token Computer vision Artificial intelligence Object detection Transformer Pattern recognition (psychology) Engineering Electrical engineering Computer network Voltage

Metrics

Cited By

0.18

FWCI (Field Weighted Citation Impact)

Refs

0.47

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image and Object Detection Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Object Detection via Multi-Scale Token Based on Vision Transformer

Abstract

Metrics

Citation History

Topics

Related Documents

Multi-Scale Vision Transformer for Defect Object Detection

TCF-DETR: multi-scale token-channel fusion transformer for enhanced small object detection

Generative EO/IR multi-scale vision transformer for improved object detection

TSVT: Token Sparsification Vision Transformer for RGB-D Salient Object Detection

Scale-aware token-matching for transformer-based object detector