Mandarin speech recognition using convolution neural network with augmented tone features

Xinhui Hu; Xugang Lu; Chiori Hori

doi:10.1109/iscslp.2014.6936674

ScienceGate Book Chapters

JOURNAL ARTICLE

Mandarin speech recognition using convolution neural network with augmented tone features

Xinhui Hu Xugang Lu Chiori Hori

Year: 2014 Vol: 2008 Pages: 15-18

DOI: 10.1109/iscslp.2014.6936674

Get Full-Text PDF Get Analytical Report

Abstract

Due to its ability of reducing spectral variations and modeling spectral correlations existed in speech signals, the convolutional neural network (CNN) has been shown effective in modeling speech compared to deep neural network (DNN). In this study, we explore applying CNN to Mandarin speech recognitions. Besides exploring appropriate CNN architecture for recognition performance, focuses are on investigating the effective acoustic features, and effectivenesses of applying tonal information which have been verified helpful in other types of acoustic models to the acoustic features in the CNN. We conduct speech recognition experiments on Mandarin broadcast speech recognition to test the effectivenesses of the proposed approaches. The CNN shows its clear superiority to the DNN, with relative reductions of character error rate (CER) among 7.7-13.1% for broadcast news speech (BN), and 5.4-9.9% for broadcast conversation speech (BC). Like in the Gaussian Mixture Model (GMM) and DNN systems, the tonal information characterized by the fundamental frequency (F ₀ ) and fundamental frequency variations (FFV) are found still helpful in CNN models, they achieve relative CER reductions over 6.7% for BN and 4.3% for BC respectively when compared with the baseline Mel-filter bank feature.

Keywords:

Speech recognition Mandarin Chinese Computer science Tone (literature) Convolution (computer science) Artificial neural network Convolutional neural network Artificial intelligence Linguistics

Metrics

Cited By

2.90

FWCI (Field Weighted Citation Impact)

Refs

0.91

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Mandarin speech recognition using convolution neural network with augmented tone features

Abstract

Metrics

Citation History

Topics

Related Documents

Incorporating tone features to convolutional neural network to improve Mandarin/Thai speech recognition

Tone Recognition of Continuous Mandarin Speech Based on Tone Nucleus Model and Neural Network

Mandarin Chinese Tone Recognition with an Artificial Neural Network

Dysarthric Speech Recognition Using Deep Convolution Neural Network

Pitch tracking and tone features for Mandarin speech recognition