Kailai HuangMi WenChen WangLina LingA VaswaniN ShazeerN ParmarJ UszkoreitL JonesA GomezI KaiserPolosukhinH TouvronM CordM DouzeF MassaA SablayrollesH JgouC SzegedyS IoffeV VanhouckeA AlemiK HeX ZhangS RenJ SunY FangB LiaoX WangJ FangJ QiR WuJ NiuW LiuN CarionF MassaG SynnaeveN UsunierA KirillovS ZagoruykoW LiuD AnguelovD ErhanC SzegedyS ReedC.-Y FuA BergT.-Y LinM MaireS BelongieJ HaysP PeronaD RamananP DollrC ZitnickH RezatofighiN TsoiJ GwakA SadeghianI ReidS Savarese
The object detection algorithms are the cornerstones of autonomous driving systems, they are mostly based on convolutional neural networks (CNNs) with one or two stages.Since its strong correlation with the life safety of the driver, the accuracy of object detectors is crucial and limited by its foundation, CNN, which is hard to improve nowadays.But at the same time, the basic transformer shows its better performance compared with the advanced CNN.To improve the accuracy, using transformers seems to be a better choice.However, most transformer-based detectors are only backbone replacements, ViT concept extension, or a fusion with CNN, cannot give a full play to the performance referring to the characteristics of the transformer.We proposed a single-stage object detector T-SSD (Transformer-based Single-Stage Detector) that comes with a multi-scale feature modeling ability.The transformer backbone extracts feature in different scales and aggregates them into an intermediate representation.The transformer neck then directly queries the semantic information from the aggregated representation and feed them to heads to make prediction once and for all.After training on COCO2017, by combining the construction philosophy of the object detector and the characteristics of transformers, our T-SSD-Tiny gives an AP (Average Precision) up to 9.0 higher than the CNN-based detectors with 100 fewer epochs, better than YOLOv3-Base and SSD-300.Also, the AP given by our T-SSD-Small is up to 4.7 higher than the transformer-based detector with the same epoch, indicating a comparable performance with DETR-ResNet-101 and YOLOS-Small.
Honghui YangWenxiao WangMinghao ChenBinbin LinTong HeHua ChenXiaofei HeWanli Ouyang
Qifeng LiuYabo DongDawei ZhaoLiang XiaoBin DaiChen MinJunru ZhangYiming NieDongming Lu
Hanxiang QianPeng WuBei SunShaojing Su
Jiaxun TongKaiqi LiuXia BaiWei Li