FPDT: a multi-scale feature pyramidal object detection transformer

Kailai Huang; Mi Wen; Chen Wang; Lina Ling

doi:10.1117/1.jrs.17.026510

ScienceGate Book Chapters

JOURNAL ARTICLE

FPDT: a multi-scale feature pyramidal object detection transformer

Kailai Huang Mi Wen Chen Wang Lina Ling

Year: 2023 Journal: Journal of Applied Remote Sensing Vol: 17 (02) Publisher: SPIE

DOI: 10.1117/1.jrs.17.026510

Get Full-Text PDF Get Analytical Report

Abstract

Object detection is a fundamental part of autonomous driving algorithms, and with the promotions of transformers in a couple of years, numerous computer vision tasks are integrating transformers into object detectors to acquire a better generalization ability. Building a pure transformer-based detector seems to be a wonderful choice; however, transformers are not omnipotent, and they come with painful drawbacks. Its fundamental operator, multi-head self-attention (MHSA), suffers from the need for computational resources due to its quadratic complexity, which demands an unreasonably high memory usage and critically low throughput. To address this issue, we use a convolution operation to simulate MHSA from transformers by referencing the philosophy and principle of MHSA and making an application migration on convolutional neural networks (CNNs). This gives a detector with power and speed simultaneously. Furthermore, a multi-scale pyramidal feature extractor gives the detector a better view over various scales. In general, our proposed object detector mainly follows the philosophy of attention mechanism, which is implemented by a multi-scale feature pyramidal CNN encoder that simulates the transformer, and a real transformer query neck to extract all of the objects once and, eventually, feed them to the output heads. After training on the COCO2017 dataset, by combining the construction philosophy of the object detector and the philosophy and characteristics of the transformer, our FPDT-Tiny gives an average precision (AP) of up to 34.1 in 150 lower epochs, which is 16.0 and 10.8 higher than CNN-based YOLOv3-Base and SSD-300, respectively. Also, the AP given by our FPDT-Small is up to 37.7 under the same epoch, which is 10.4 and 7.9 higher than the transformer-based detector YOLOS-Small and DETR-ResNet-152, respectively, also demonstrating a comparable performance.

Keywords:

Computer science Detector Transformer Convolutional neural network Encoder Artificial intelligence Object detection Computer vision Pattern recognition (psychology) Electrical engineering Voltage Engineering

Metrics

Cited By

0.73

FWCI (Field Weighted Citation Impact)

Refs

0.65

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

FPDT: a multi-scale feature pyramidal object detection transformer

Abstract

Metrics

Citation History

Topics

Related Documents

Multi‐Scale Feature Attention‐DEtection TRansformer: Multi‐Scale Feature Attention for security check object detection

Multi-scale Feature Fusion Object Detection Based on Swin Transformer

Salient Object Detection Based on Transformer and Multi-scale Feature Fusion

Rethinking the multi-scale feature hierarchy in object detection transformer (DETR)

MEFormer: Multi-Object Tracking with Multi-Scale Feature Enhanced Transformer