T-SSD: A Transformer-based Single-Stage Multi-Scale Sampling Object Detector

Kailai Huang; Mi Wen; Chen Wang; Lina Ling; A Vaswani; N Shazeer; N Parmar; J Uszkoreit; L Jones; A Gomez; I Kaiser; Polosukhin; H Touvron; M Cord; M Douze; A Sablayrolles; H Jgou; S Ioffe; V Vanhoucke; A Alemi; K He; X Zhang; S Ren; J Sun; Y Fang; B Liao; X Wang; J Fang; J Qi; R Wu; J Niu; N Carion; F Massa; G Synnaeve; N Usunier; A Kirillov; S Zagoruyko; W Liu; D Anguelov; D Erhan; C Szegedy; S Reed; C.-Y Fu; A Berg; T.-Y Lin; M Maire; S Belongie; J Hays; P Perona; D Ramanan; P Dollr; C Zitnick; H Rezatofighi; N Tsoi; J Gwak; A Sadeghian; I Reid; S Savarese

doi:10.18178/wcse.2023.06.022

ScienceGate Book Chapters

JOURNAL ARTICLE

T-SSD: A Transformer-based Single-Stage Multi-Scale Sampling Object Detector

Get Full-Text PDF Get Analytical Report

Abstract

The object detection algorithms are the cornerstones of autonomous driving systems, they are mostly based on convolutional neural networks (CNNs) with one or two stages.Since its strong correlation with the life safety of the driver, the accuracy of object detectors is crucial and limited by its foundation, CNN, which is hard to improve nowadays.But at the same time, the basic transformer shows its better performance compared with the advanced CNN.To improve the accuracy, using transformers seems to be a better choice.However, most transformer-based detectors are only backbone replacements, ViT concept extension, or a fusion with CNN, cannot give a full play to the performance referring to the characteristics of the transformer.We proposed a single-stage object detector T-SSD (Transformer-based Single-Stage Detector) that comes with a multi-scale feature modeling ability.The transformer backbone extracts feature in different scales and aggregates them into an intermediate representation.The transformer neck then directly queries the semantic information from the aggregated representation and feed them to heads to make prediction once and for all.After training on COCO2017, by combining the construction philosophy of the object detector and the characteristics of transformers, our T-SSD-Tiny gives an AP (Average Precision) up to 9.0 higher than the CNN-based detectors with 100 fewer epochs, better than YOLOv3-Base and SSD-300.Also, the AP given by our T-SSD-Small is up to 4.7 higher than the transformer-based detector with the same epoch, indicating a comparable performance with DETR-ResNet-101 and YOLOS-Small.

Keywords:

Computer science Detector Transformer Electrical engineering Voltage Engineering Telecommunications

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.20

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Industrial Vision Systems and Defect Detection

Physical Sciences → Engineering → Industrial and Manufacturing Engineering

T-SSD: A Transformer-based Single-Stage Multi-Scale Sampling Object Detector

Abstract

Metrics

Topics

Related Documents

PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer

MT-SSD: Single-Stage 3D Object Detector Based on Magnification Transformation

Improved Transformer-Based SSD Detector for Airborne Object Detection

AGS-SSD: Attention-Guided Sampling for 3D Single-Stage Detector

DA-SSD: Domain Adaptation for 3D Single Stage Object Detector