Robust language identification using convolutional neural network features

Sriram Ganapathy; Kyu Young Han; Samuel Thomas; Mohamed Omar; Maarten Van Segbroeck; Shrikanth Narayanan

doi:10.21437/interspeech.2014-419

ScienceGate Book Chapters

JOURNAL ARTICLE

Robust language identification using convolutional neural network features

Sriram Ganapathy Kyu Young Han Samuel Thomas Mohamed Omar Maarten Van Segbroeck Shrikanth Narayanan

Year: 2014 Pages: 1846-1850

DOI: 10.21437/interspeech.2014-419

Get Full-Text PDF Get Analytical Report

Abstract

The language identification (LID) task in the Robust Automatic Transcription of Speech (RATS) program is challenging due to the noisy nature of the audio data collected over highly degraded radio communication channels as well as the use of short duration speech segments for testing. In this paper, we report the recent advances made in the RATS LID task by using bottleneck features from a convolutional neural network (CNN). The CNN, which is trained with labelled data from one of target languages, generates bottleneck features which are used in a Gaussian mixture model (GMM)-ivector LID system. The CNN bottleneck features provide substantial complimentary information to the conventional acoustic features even on languages not seen in its training. Using these bottleneck features in conjunction with acoustic features, we obtain significant improvements (average relative improvements of 25% in terms of equal error rate (EER) compared to the corresponding acoustic system) for the LID task. Furthermore, these improvements are consistent for various choices of acoustic features as well as speech segment durations.

Keywords:

Bottleneck Computer science Convolutional neural network Speech recognition Word error rate Task (project management) Identification (biology) Artificial intelligence Artificial neural network Acoustic model Language identification Recurrent neural network Speech processing Pattern recognition (psychology) Natural language Engineering

Metrics

Cited By

3.38

FWCI (Field Weighted Citation Impact)

Refs

0.93

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Robust language identification using convolutional neural network features

Abstract

Metrics

Citation History

Topics

Related Documents

Automatic Sign Language Identification Using Convolutional Neural Network

Language Identification using Stacked Convolutional Neural Network (SCNN)

Offline Writer Identification Using Convolutional Neural Network Activation Features

Spoken Language Identification System Using Convolutional Recurrent Neural Network

Human activity recognition using robust spatiotemporal features and convolutional neural network