JOURNAL ARTICLE

Partial Scene Text Retrieval

Hao WangMinghui LiaoZhouyi XieWenyu LiuXiang Bai

Year: 2024 Journal:   IEEE Transactions on Pattern Analysis and Machine Intelligence Vol: 47 (3)Pages: 1548-1563   Publisher: IEEE Computer Society

Abstract

The task of partial scene text retrieval involves localizing and searching for text instances that are the same or similar to a given query text from an image gallery. However, existing methods can only handle text-line instances, leaving the problem of searching for partial patches within these text-line instances unsolved due to a lack of patch annotations in the training data. To address this issue, we propose a network that can simultaneously retrieve both text-line instances and their partial patches. Our method embeds the two types of data (query text and scene text instances) into a shared feature space and measures their cross-modal similarities. To handle partial patches, our proposed approach adopts a Multiple Instance Learning (MIL) approach to learn their similarities with query text, without requiring extra annotations. However, constructing bags, which is a standard step of conventional MIL approaches, can introduce numerous noisy samples for training, and lower inference speed. To address this issue, we propose a Ranking MIL (RankMIL) approach to adaptively filter those noisy samples. Additionally, we present a Dynamic Partial Match Algorithm (DPMA) that can directly search for the target partial patch from a text-line instance during the inference stage, without requiring bags. This greatly improves the search efficiency and the performance of retrieving partial patches. We evaluate the proposed method on both English and Chinese datasets in two tasks: retrieving text-line instances and partial patches. For English text retrieval, our method outperforms state-of-the-art approaches by 8.04% mAP and 12.71% mAP on average, respectively, among three datasets for the two tasks. For Chinese text retrieval, our approach surpasses state-of-the-art approaches by 24.45% mAP and 38.06% mAP on average, respectively, among three datasets for the two tasks. The source code and dataset are available at https://github.com/lanfeng4659/PSTR.

Keywords:
Computer science Artificial intelligence Computer vision Information retrieval Pattern recognition (psychology)

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
62
Refs
0.23
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Handwritten Text Recognition Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Web Data Mining and Analysis
Physical Sciences →  Computer Science →  Information Systems

Related Documents

BOOK-CHAPTER

Single Shot Scene Text Retrieval

Lluís GómezAndrés MaflaMarçal RusiñolDìmosthenis Karatzas

Lecture notes in computer science Year: 2018 Pages: 728-744
JOURNAL ARTICLE

Visual and semantic guided scene text retrieval

Hailong LuoMayire IbrayimAskar HamdullaQilin Deng

Journal:   The Journal of Supercomputing Year: 2024 Vol: 80 (14)Pages: 21394-21411
BOOK-CHAPTER

Shared Vision Transformer Helps Scene Text Retrieval

Hailong LuoMayire IbrayimAskar HamdullaQilin Deng

Lecture notes in computer science Year: 2025 Pages: 391-405
JOURNAL ARTICLE

A scene text-based image retrieval system

Thuy HoNgoc Quoc Ly

Year: 2012 Vol: 2 Pages: 000079-000084
© 2026 ScienceGate Book Chapters — All rights reserved.