Morphological analysis is an important task in Estonian learner language studies that gives information about the words and forms used by the learners. Because of the spelling errors frequently occurring in language learner texts, these texts should undergo some error correction step before applying the conventional morphological analysis tools because the morphological analyser fails to find the correct analysis for the misspelled words. In this paper we compare several different spelling correction models with the aim of improving the lemmatisation accuracy of learner language texts. Experiments show that the simplest non-word noisy-channel spelling correction model with a disambiguation model applied on top of the morphological analyser output performs the best while some of the more complicated models even fail to beat the baseline that does not include any spelling correction.
Mohammad Hoseyn SheykholeslamBehrouz Minaei‐BidgoliHossein Juzi
Ryo NagataHiroya TakamuraGraham Neubig
Saida LaaroussiSi Lhoussain AouraghAbdellah YousfiMohammed NejjaHicham GeddahSaïd Ouatik El Alaoui
Mark D. KernighanKenneth ChurchWilliam A. Gale