Abstract

We present a new large-scale collection of 54,772 queries with manually annotated spelling corrections. For 9,170 of the queries (16.74%), spelling variants that are different to the original query are proposed. With its size, our new corpus is an order of magnitude larger than other publicly available query spelling corpora. In addition to releasing the new large-scale corpus, we also provide an implementation of the winner of the Microsoft Speller Challenge from~2011 and compare it on the different publicly available corpora to spelling corrections mined from Google and Bing. This way, we also shed some light on the spelling correction performance of state-of-the-art commercial search systems.

Keywords:
Spelling Computer science Natural language processing Artificial intelligence Scale (ratio) Information retrieval Query expansion Speech recognition Linguistics

Metrics

22
Cited By
1.60
FWCI (Field Weighted Citation Impact)
17
Refs
0.86
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Web Data Mining and Analysis
Physical Sciences →  Computer Science →  Information Systems

Related Documents

BOOK-CHAPTER

Query Spelling Correction

Yanen Li

˜The œinformation retrieval series Year: 2020 Pages: 103-127
BOOK-CHAPTER

Query Spelling Correction

Reda AlhajjJon Rokne

Year: 2018 Pages: 1977-1977
© 2026 ScienceGate Book Chapters — All rights reserved.