Improving Voice Activity Detection for Multimodal Movie Dialogue Corpus

Tetsuo Kosaka; Ikumi Suga; Masashi Inoue

doi:10.1109/gcce.2018.8574730

ScienceGate Book Chapters

JOURNAL ARTICLE

Improving Voice Activity Detection for Multimodal Movie Dialogue Corpus

Tetsuo Kosaka Ikumi Suga Masashi Inoue

Year: 2018 Vol: 13 Pages: 481-484

DOI: 10.1109/gcce.2018.8574730

Get Full-Text PDF Get Analytical Report

Abstract

Detection of speech segments from sound sequences is an important task for various applications. Methods for the voice activity detection (VAD) task have been developed, but when they are applied to movie data, because of the noise in movies, the accuracy has deteriorated. This noise problem is addressed by using a deep neural network (DNN) model for VAD. Although the overall performance of the DNN-based model was satisfactory, there was a clear drop in performance when singing voices or musical sounds existed as background noise. In this study, the effectiveness of changing VAD models from a binary classifier to a multi-class classifier was examined. As a result, it was found that DNN-based and multi-class VAD models can incorporate singing voices and musical sounds sufficiently. In the experiments, an equal error rate of 3.92% was obtained in movies.

Keywords:

Computer science Speech recognition Singing Classifier (UML) Word error rate Binary classification Artificial neural network Voice activity detection Background noise Artificial intelligence Noise (video) Pattern recognition (psychology) Speech processing Acoustics Support vector machine

Metrics

Cited By

0.72

FWCI (Field Weighted Citation Impact)

Refs

0.71

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Improving Voice Activity Detection for Multimodal Movie Dialogue Corpus

Abstract

Metrics

Citation History

Topics

Related Documents

Large-scale multimodal movie dialogue corpus

Towards Improving Turn-Taking in Social Robots Using Visual-Only Voice Activity Detection in Multimodal Dialogue Systems

Multimodal Dialogue Corpus Hazumi

Improving voice activity detection in movies

On the improvement of multimodal voice activity detection