Speech/Non-Speech Segmentation Based on Phoneme Recognition Features

Janez Žibert; Nikola Pavešić; France Mihelič

doi:10.1155/asp/2006/90495

ScienceGate Book Chapters

JOURNAL ARTICLE

Speech/Non-Speech Segmentation Based on Phoneme Recognition Features

Janez Žibert Nikola Pavešić France Mihelič

Year: 2006 Journal: EURASIP Journal on Advances in Signal Processing Vol: 2006 (1) Publisher: Springer Science+Business Media

DOI: 10.1155/asp/2006/90495

Get Full-Text PDF Get Analytical Report

Abstract

This work assesses different approaches for speech and non-speech segmentation of audio data and proposes a new, high-level representation of audio signals based on phoneme recognition features suitable for speech/non-speech discrimination tasks. Unlike previous model-based approaches, where speech and non-speech classes were usually modeled by several models, we develop a representation where just one model per class is used in the segmentation process. For this purpose, four measures based on consonant-vowel pairs obtained from different phoneme speech recognizers are introduced and applied in two different segmentation-classification frameworks. The segmentation systems were evaluated on different broadcast news databases. The evaluation results indicate that the proposed phoneme recognition features are better than the standard mel-frequency cepstral coefficients and posterior probability-based features (entropy and dynamism). The proposed features proved to be more robust and less sensitive to different training and unforeseen conditions. Additional experiments with fusion models based on cepstral and the proposed phoneme recognition features produced the highest scores overall, which indicates that the most suitable method for speech/non-speech segmentation is a combination of low-level acoustic features and high-level recognition features.

Keywords:

Computer science Speech recognition Segmentation Mel-frequency cepstrum Speech segmentation Acoustic model Speech processing Artificial intelligence Hidden Markov model Pattern recognition (psychology) Cepstrum Speech coding Feature extraction

Metrics

Cited By

2.36

FWCI (Field Weighted Citation Impact)

Refs

0.89

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech/Non-Speech Segmentation Based on Phoneme Recognition Features

Abstract

Metrics

Citation History

Topics

Related Documents

Phoneme Segmentation in Speech Signals Using CTC-based Speech Recognition Model and Low-level Features

Phoneme-based continuous speech recognition without pre-segmentation

Phoneme based speech recognition

Robust Speech Detection Based on Phoneme Recognition Features

Significance of segmentation in phoneme based Tamil speech recognition system