Te LiHuajun WangGuangzhi LiSongshan LiuTang Li
Abstract In recent years, transformer has made great achievements in the field of NLP and is gradually applied to Computer Vision. However, due to the particularity of images, the computational complexity of transformer is quite high. The windowing operation proposed by Swin transformer effectively solves this problem. We find that Swin transformer has the same hierarchical structure as CNN, so we propose SwinF network with feature fusion based on Swin transformer. On the coco type dataset, Swin transformer achieves 40.3mAP, while SwinF achieves 42.5mAP in the field of target detection.
Ying ZhangLin WuHuaxuan DengJun HuXifan Li
Lei LiuYidi JiaoXiaoran LiJing LiHaitao WangXinyu Cao