Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation

Maoyuan Ye; Jing Zhang; Juhua Liu; Chenyu Liu; Baocai Yin; Cong Liu; Bo Du; Dacheng Tao

doi:10.1109/tpami.2024.3495831

ScienceGate Book Chapters

JOURNAL ARTICLE

Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation

Maoyuan Ye Jing Zhang Juhua Liu Chenyu Liu Baocai Yin Cong Liu Bo Du Dacheng Tao

Year: 2024 Journal: IEEE Transactions on Pattern Analysis and Machine Intelligence Vol: 47 (3)Pages: 1431-1447 Publisher: IEEE Computer Society

DOI: 10.1109/tpami.2024.3495831

Get Full-Text PDF Get Analytical Report

Abstract

The Segment Anything Model (SAM), a profound vision foundation model pretrained on a large-scale dataset, breaks the boundaries of general segmentation and sparks various downstream applications. This paper introduces Hi-SAM, a unified model leveraging SAM for hierarchical text segmentation. Hi-SAM excels in segmentation across four hierarchies, including pixel-level text, word, text-line, and paragraph, while realizing layout analysis as well. Specifically, we first turn SAM into a high-quality pixel-level text segmentation (TS) model through a parameter-efficient fine-tuning approach. We use this TS model to iteratively generate the pixel-level text labels in a semi-automatical manner, unifying labels across the four text hierarchies in the HierText dataset. Subsequently, with these complete labels, we launch the end-to-end trainable Hi-SAM based on the TS architecture with a customized hierarchical mask decoder. During inference, Hi-SAM offers both automatic mask generation (AMG) mode and promptable segmentation (PS) mode. In the AMG mode, Hi-SAM segments pixel-level text foreground masks initially, then samples foreground points for hierarchical text mask generation and achieves layout analysis in passing. As for the PS mode, Hi-SAM provides word, text-line, and paragraph masks with a single point click. Experimental results show the state-of-the-art performance of our TS model: 84.86% fgIOU on Total-Text and 88.96% fgIOU on TextSeg for pixel-level text segmentation. Moreover, compared to the previous specialist for joint hierarchical detection and layout analysis on HierText, Hi-SAM achieves significant improvements: 4.73% PQ and 5.39% F1 on the text-line level, 5.49% PQ and 7.39% F1 on the paragraph level layout analysis, requiring fewer training epochs.

Keywords:

Artificial intelligence Computer science Segmentation Image segmentation Pattern recognition (psychology) Natural language processing Computer vision

Metrics

Cited By

27.50

FWCI (Field Weighted Citation Impact)

Refs

0.99

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Web Data Mining and Analysis

Physical Sciences → Computer Science → Information Systems

Advanced Text Analysis Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Text and Document Classification Technologies

Physical Sciences → Computer Science → Artificial Intelligence

Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation

Abstract

Metrics

Citation History

Topics

Related Documents

Segment Anything Model (SAM)

Rectal cancer segmentation via HHF-SAM: a hierarchical hypercolumn-guided fusion segment anything model

An Investigation of Segment Anything Model (SAM) on Uterus Segmentation

SAM-Glomeruli: Enhanced Segment Anything Model for Precise Glomeruli Segmentation

WeakPolyp-SAM: Segment Anything Model-driven weakly-supervised polyp segmentation