JOURNAL ARTICLE

Combining Swin Transformer and Attention-Weighted Fusion for Scene Text Detection

Xianguo LiXingchen YaoY. F. Liu

Year: 2024 Journal:   Neural Processing Letters Vol: 56 (2)   Publisher: Springer Science+Business Media

Abstract

Abstract The existing text detection algorithms based on Convolutional Neural Networks (CNN) commonly have the problems of insufficient receptive fields and inadequate extraction of spatial positional information, which limit their ability to detect large-scale variation text instances, long-distance and wide-spaced text instances as well as effectively distinguish complex background textures. To address the above problems, in this paper, a scene text detection algorithm combining Swin Transformer and attention-weighted fusion is proposed. Firstly, an attention-weighted fusion (AWF) module is proposed, which embeds a modified coordinate attention module (CAM) in the feature pyramid network (FPN). This module learns spatial positional weights of foreground information in different-scale features while suppressing redundant background information. As a result, the fused features are more focused on the text regions, enhancing the localization ability for text regions and boundaries. Secondly, the window-based self-attention mechanism of the Swin Transformer is utilized to achieve global feature perception on the fused features of the pyramid network. This compensates for the insufficient receptive fields of CNN and enhances the representation capability of global contextual features, thereby further improving the performance of text detection. Experimental results demonstrate that the proposed algorithm achieves competitive performance on three public datasets, namely ICDAR2015, MSRA-TD500, and Total-Text, with F-measure reaching 87.9%, 91.4%, and 86.7%, respectively. Code is available at: https://github.com/xgli411/ST-AWFNet .

Keywords:
Computational intelligence Transformer Artificial intelligence Fusion Computer science Pattern recognition (psychology) Linguistics Engineering Electrical engineering Philosophy Voltage

Metrics

8
Cited By
4.24
FWCI (Field Weighted Citation Impact)
46
Refs
0.89
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Handwritten Text Recognition Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Processing and 3D Reconstruction
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Vehicle License Plate Recognition
Physical Sciences →  Engineering →  Media Technology

Related Documents

JOURNAL ARTICLE

Attention-Based Scene Text Detection on Dual Feature Fusion

Yuze LiWushour SilamuZhenchao WangMiaomiao Xu

Journal:   Sensors Year: 2022 Vol: 22 (23)Pages: 9072-9072
JOURNAL ARTICLE

Natural Scene Text Detection Algorithm Combining Multi-granularity Feature Fusion

Zhuo Wang

Journal:   DOAJ (DOAJ: Directory of Open Access Journals) Year: 2021
JOURNAL ARTICLE

A fusion‐attention swin transformer for cardiac MRI image segmentation

Ruiping YangKun LiuYongquan Liang

Journal:   IET Image Processing Year: 2023 Vol: 18 (1)Pages: 105-115
© 2026 ScienceGate Book Chapters — All rights reserved.