Normalization of text messages using character- and phone-based machine translation approaches

Chen Li; Yang Liu

doi:10.21437/interspeech.2012-611

ScienceGate Book Chapters

JOURNAL ARTICLE

Normalization of text messages using character- and phone-based machine translation approaches

Chen Li Yang Liu

Year: 2012 Pages: 2330-2333

DOI: 10.21437/interspeech.2012-611

Get Full-Text PDF Get Analytical Report

Abstract

There are many abbreviation and non-standard words in SMS and Twitter messages. They are problematic for text-to-speech (TTS) or language processing techniques for these data. A character-based machine translation (MT) approach was previously used for normalization of non-standard words. In this paper, we propose a two-stage translation method to leverage phonetic information, where non-standard words are first translated to possible pronunciations, which are then translated to standard words. We further combine it with the single-step character-based translation module. Our experiments show that our proposed method significantly outperforms previous results in both n-best coverage and 1-best accuracy.

Keywords:

Normalization (sociology) Computer science Machine translation Natural language processing Leverage (statistics) Artificial intelligence Character (mathematics) Phone Speech recognition Translation (biology) Linguistics Mathematics

Metrics

Cited By

3.41

FWCI (Field Weighted Citation Impact)

Refs

0.94

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Speech and dialogue systems

Physical Sciences → Computer Science → Artificial Intelligence

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Normalization of text messages using character- and phone-based machine translation approaches

Abstract

Metrics

Citation History

Topics

Related Documents

Normalization of shorthand forms in French text messages using word embedding and machine translation

Statistical machine translation based text normalization with crowdsourcing

Neural Machine Translation for Malay Text Normalization using Synthetic Dataset

Cognate Production using Character-based Machine Translation

Statistical models for text normalization and machine translation