CRASpell: A Contextual Typo Robust Approach to Improve Chinese Spelling Correction

Shulin Liu; Shengkang Song; Tianchi Yue; Tao Yang; Houzhi Cai; TingHao Yu; Song Sun

doi:10.60692/vhf4q-knx41

ScienceGate Book Chapters

JOURNAL ARTICLE

CRASpell: A Contextual Typo Robust Approach to Improve Chinese Spelling Correction

Shulin Liu Shengkang Song Tianchi Yue Tao Yang Houzhi Cai TingHao Yu Song Sun

Year: 2022 Journal: Greater South Information System

DOI: 10.60692/vhf4q-knx41

Get Full-Text PDF Get Analytical Report

Abstract

Chinese spelling correction (CSC) models detect and correct a typo in texts based on the misspelled character and its context.Recently, Bert-based models have dominated the research of Chinese spelling correction (CSC).These methods have two limitations: (1) they have poor performance on multi-typo texts.In such texts, the context of each typo contains at least one misspelled character, which brings noise information.Such noisy context leads to the declining performance on multi-typo texts.(2) they tend to overcorrect valid expressions to more frequent expressions due to the masked token recovering task of Bert.We attempt to address these limitations in this paper.To make our model robust to contextual noise brought by typos, our approach first constructs a noisy context for each training sample.Then the correction model is forced to yield similar outputs based on the noisy and original contexts.Moreover, to address the overcorrection problem, copy mechanism is incorporated to encourage our model to prefer to choose the input character when the miscorrected and input character are both valid according to the given context.Experiments are conducted on widely used benchmarks.Our model achieves superior performance against state-of-the-art methods by a remarkable gain.We release the source code and pre-trained model for further use by the community 1 .

Keywords:

Spelling Context (archaeology) Task (project management) Character (mathematics) Noise (video) Security token Code (set theory) Matching (statistics)

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.23

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Authorship Attribution and Profiling

Physical Sciences → Computer Science → Artificial Intelligence

CRASpell: A Contextual Typo Robust Approach to Improve Chinese Spelling Correction

Abstract

Metrics

Topics

Related Documents

CRASpell: A Contextual Typo Robust Approach to Improve Chinese Spelling Correction

CRASpell: A Contextual Typo Robust Approach to Improve Chinese Spelling Correction

Contextual Spelling Correction

IME-MTCSC: A Multi-Typo Chinese Spelling Correction Model by Input Method Editor

Robust Spelling Correction