Incorporating tone features to convolutional neural network to improve Mandarin/Thai speech recognition

Xinhui Hu; Masahiro Saiko; Chiori Hori

doi:10.1109/apsipa.2014.7041576

ScienceGate Book Chapters

JOURNAL ARTICLE

Incorporating tone features to convolutional neural network to improve Mandarin/Thai speech recognition

Xinhui Hu Masahiro Saiko Chiori Hori

Year: 2014

DOI: 10.1109/apsipa.2014.7041576

Get Full-Text PDF Get Analytical Report

Abstract

Tone plays an important role in distinguishing lexical meaning in tonal languages, such as Mandarin and Thai. It has been revealed that tone information is helpful to improve automatic speech recognition (ASR) for these languages. In this study, we incorporate tone features from the fundamental frequency (Fo) and fundamental frequency variation (FFV) to the convolutional neural network (CNN), a state-of-the-art acoustic modeling approach, for acoustic modeling of the ASR systems. Due to its abilities of reducing spectral variations and modeling spectral correlations existing in speech signals, the CNN is expected to model well tone patterns which mainly behave in the frequency domain, by Fo contur. We conduct speech ASR experiments on Mandarin and Thai to evaluate the effectivenesses of the proposed approaches. With the help of tone features, the character error rates (CERs) of Mandarin achieve 4.3-7.1% relative reductions, and the word error rates (WERs) of Thai achieve 0.41-6.26% relative reductions. The CNN shows its clear superiority to the deep neural network (DNN), with relative CER reductions of 5.4-13.1% for Mandarin, and relative WER reductions of 0.5-5.6% for Thai.

Keywords:

Mandarin Chinese Speech recognition Tone (literature) Computer science Convolutional neural network Artificial intelligence Natural language processing Linguistics

Metrics

Cited By

2.45

FWCI (Field Weighted Citation Impact)

Refs

0.90

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Incorporating tone features to convolutional neural network to improve Mandarin/Thai speech recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Mandarin speech recognition using convolution neural network with augmented tone features

Tone Recognition of Continuous Mandarin Speech Based on Tone Nucleus Model and Neural Network

Pitch tracking and tone features for Mandarin speech recognition

Thai Vowels Speech Recognition using Convolutional Neural Networks

Speech Emotion Recognition through Hybrid Features and Convolutional Neural Network