DSNet: A End‐to‐End Scene Text Spotting Network With Dual‐Stream Feature Fusion

Mengjie Zhong; Xihan Wang; Lian-He Shao; Quanli Gao

doi:10.1049/ell2.70334

ScienceGate Book Chapters

JOURNAL ARTICLE

DSNet: A End‐to‐End Scene Text Spotting Network With Dual‐Stream Feature Fusion

Mengjie Zhong Xihan Wang Lian-He Shao Quanli Gao

Year: 2025 Journal: Electronics Letters Vol: 61 (1) Publisher: Institution of Engineering and Technology

DOI: 10.1049/ell2.70334

Get Full-Text PDF Get Analytical Report

Abstract

ABSTRACT End‐to‐end scene text spotting has attracted considerable academic interest in recent years. However, due to complex environmental factors, text recognition remains a formidable challenge. In this paper, we introduce an end‐to‐end scene text spotting framework, referred to as DSNet. This framework comprises two principal modules: the text feature enhancement module (TFEM) for enhancing text regions and the redundant feature suppression module (RFSM) for noise suppression. Within the TFEM, we have designed multiple transformer layers for feature encoding; these layers are utilized to extract and enhance the feature representation of the text region. In the RFSM, we have designed a spatial reconstruction unit (SRU) and a channel reconstruction unit (CRU); these units effectively suppress irrelevant information through the feature reconstruction process. The proposed framework jointly optimizes text features by operating the TFEM and RFSM in parallel. The fused features from both modules are subsequently input to the decoder, enabling precise text area localization and robust character recognition. Extensive experiments demonstrate that our model achieves competitive performance in end‐to‐end scene text spotting, attaining an F‐measure of 90.2% on ICDAR2015, closely approaching the state‐of‐the‐art (91.0%).

Keywords:

Spotting Computer science Artificial intelligence Feature (linguistics) Keyword spotting Pattern recognition (psychology) End-to-end principle Feature extraction

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.19

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Handwritten Text Recognition Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Vehicle License Plate Recognition

Physical Sciences → Engineering → Media Technology

DSNet: A End‐to‐End Scene Text Spotting Network With Dual‐Stream Feature Fusion

Abstract

Metrics

Topics

Related Documents

Feature Fusion Pyramid Network for End-to-End Scene Text Detection

Towards End-to-End Scene Text Spotting by Sharing Convolutional Feature Map

Scene text spotting based on end-to-end

An End-to-End Basis-Spline Based Text Spotting Network for Scene Text

RMFPN: End-to-End Scene Text Recognition Using Multi-Feature Pyramid Network