JOURNAL ARTICLE

A spelling correction program based on a noisy channel model

Abstract

This paper describes a new program, correct, which takes words rejected by the Unix spell program, proposes a list of candidate corrections, and sorts them by probability. The probability scores are the novel contribution of this work. Probabilities are based on a noisy channel model. It is assumed that the typist knows what words he or she wants to type but some noise is added on the way to the keyboard (in the form of typos and spelling errors). Using a classic Bayesian argument of the kind that is popular in the speech recognition literature (Jelinek, 1985), one can ofn recover the intended correction, c, from a typo, t, by finding the correction c that maximizes Pt(c)Pr(t]c). The first factor, Pr(c), is a prior model of word probabilities; the second factor, Pr(tlc), is a model of the noisy channel that accounts for spelling transformations on letter sequences (e.g., insertions, deletions, substitutions and reversals). Both sets of probabilities were trained on data collected from the Associated Press (AP) newswire. This text is ideaily suited for this purpose since it contains a large number of typos (about two thousand per month).

Keywords:
Spelling Computer science Word (group theory) Channel (broadcasting) Speech recognition Spell Language model Argument (complex analysis) Noise (video) Natural language processing Bayesian probability Artificial intelligence Algorithm Mathematics Linguistics

Metrics

264
Cited By
4.35
FWCI (Field Weighted Citation Impact)
7
Refs
0.93
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.