SRDD: a lightweight end-to-end object detection with transformer

Yuan Zhu; Qingyuan Xia; Wen Jin

doi:10.1080/09540091.2022.2125499

ScienceGate Book Chapters

JOURNAL ARTICLE

SRDD: a lightweight end-to-end object detection with transformer

Yuan Zhu Qingyuan Xia Wen Jin

Year: 2022 Journal: Connection Science Vol: 34 (1)Pages: 2448-2465 Publisher: Taylor & Francis

DOI: 10.1080/09540091.2022.2125499

Get Full-Text PDF Get Analytical Report

Abstract

Computer vision is now playing a vital role in modern UAV (Unmanned Aerial Vehicle) systems. However, the on-board real-time small object detection for UAVs remains challenging. This paper presents an end-to-end ViT (Vision Transformer) detector, named Sparse ROI-based Deformable DETR (SRDD), to make ViT model available to UAV on-board systems. We embed a scoring network in the transformer T-encoder to selectively prune the redundant tokens, at the same time, introduce ROI-based detection refinement module in the decoder to optimise detection performance while maintaining end-to-end detection pipeline. By using scoring networks, we compress the Transformer encoder/decoder to 1/3-layer structure, which is far slim compared with DETR. With the help of lightweight backbone ResT and dynamic anchor box, we relieve the memory insufficient of on-board SoC. Experiment on UAVDT dataset shows the proposed SRDD method achieved 50.2% mAP (outperforms Deformable DETR at least 7%). In addition, the lightweight version of SRDD achieved 51.08% mAP with 44% Params reduction.

Keywords:

End-to-end principle Computer science Transformer Artificial intelligence Computer vision Electrical engineering Voltage Engineering

Metrics

Cited By

1.86

FWCI (Field Weighted Citation Impact)

Refs

0.84

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Video Surveillance and Tracking Methods

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

SRDD: a lightweight end-to-end object detection with transformer

Abstract

Metrics

Citation History

Topics

Related Documents

End-to-End Object Detection with Adaptive Clustering Transformer

End-to-End Object Detection with Adaptive Clustering Transformer

End-to-End Object Detection with Adaptive Clustering Transformer

End-to-End Human Object Interaction Detection with HOI Transformer

V-DETR: Pure Transformer for End-to-End Object Detection