Unsupervised utterance-wise beamformer estimation with speech recognition-level criterion

Takuya Higuchi; Takuya Yoshioka; Keisuke Kinoshita; Tomohiro Nakatani

doi:10.1109/icassp.2017.7953142

ScienceGate Book Chapters

JOURNAL ARTICLE

Unsupervised utterance-wise beamformer estimation with speech recognition-level criterion

Takuya Higuchi Takuya Yoshioka Keisuke Kinoshita Tomohiro Nakatani

Year: 2017 Pages: 5170-5174

DOI: 10.1109/icassp.2017.7953142

Get Full-Text PDF Get Analytical Report

Abstract

In this paper, we perform beamforming with a speech recognition-level criterion. A beamformer is usually designed by optimizing signal-level criteria, e.g., by minimizing the beamformer output covariance or by maximizing the signal-to-noise ratio (SNR). Such signal-level criteria do not always guarantee that the optimized beamformer is the best for noise robust automatic speech recognition. Recently, a few approaches have been proposed for performing beamforming with a speech recognition-level criterion. These approaches train beamformers along with an acoustic model by using multichannel training data and a parallel corpus of noisy and clean data. This paper proposes a novel approach for estimating the beamformer for every test utterance with a speech recognition-level criterion. We use an unsupervised acoustic model adaptation scheme to optimize our beamformer. Specifically, we first obtain decoding results with an initialized beamformer, and then we optimize our beamformer using back propagation to minimize the cross entropy between the first-pass decoding results and actual network outputs. With this approach, our beamformer can be trained to discriminate hidden Markov model states more clearly for every test utterance. Experimental results show that our beamformer outperforms a beamformer designed with a signal-level criterion.

Keywords:

Computer science Speech recognition Adaptive beamformer Beamforming Hidden Markov model Decoding methods Pattern recognition (psychology) Artificial intelligence Noise (video) Algorithm Telecommunications

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.07

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Unsupervised utterance-wise beamformer estimation with speech recognition-level criterion

Abstract

Metrics

Topics

Related Documents

Unsupervised Speech Recognition via Utterance-wise Pseudo-labeling

Learning Utterance-Level Representations with Label Smoothing for Speech Emotion Recognition

Optimization of Speech Enhancement Front-End with Speech Recognition-Level Criterion

Utterance Level Feature Aggregation with Deep Metric Learning for Speech Emotion Recognition

Utterance-Wise Recurrent Dropout and Iterative Speaker Adaptation for Robust Monaural Speech Recognition