Phoneme recognition using speech image (spectrogram)

Masoud Ahmadi; Nick Bailey; Brian Hoyle

doi:10.1109/icsigp.1996.567353

ScienceGate Book Chapters

JOURNAL ARTICLE

Phoneme recognition using speech image (spectrogram)

Masoud Ahmadi Nick Bailey Brian Hoyle

Year: 2002 Vol: 1 Pages: 675-677

DOI: 10.1109/icsigp.1996.567353

Get Full-Text PDF Get Analytical Report

Abstract

In this paper a novel feature extraction technique based on the two-dimensional DCT (discrete cosine transform) and zigzag scanning of the spectrogram is proposed. This is in contrast to conventional approaches based on single dimension analysis such as LPC, cepstral, or FFT. As a phoneme recognition task, a series of experiments were conducted on the voice stops ('b', 'd', 'g') of the TIMIT database uttered by 630 speakers (male and female). The extracted data form the basis for input patterns for training two types of neural networks, the semi-dynamic network (TDNN), and a static network (MLP). The highest recognition rates of 77.5 and 72.4 percent were recorded for TDNN and MLP respectively. This contrasts with results of 72 percent quoted by Hwang et al. (1992) for the same phonemes spoken by 40 females.

Keywords:

Spectrogram Speech recognition Computer science Mel-frequency cepstrum Discrete cosine transform Feature extraction Artificial neural network Pattern recognition (psychology) Artificial intelligence Feature (linguistics) Time delay neural network Cepstrum TIMIT Image (mathematics) Hidden Markov model

Metrics

Cited By

0.37

FWCI (Field Weighted Citation Impact)

Refs

0.63

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Neural Networks and Applications

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Advanced Data Compression Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Phoneme recognition using speech image (spectrogram)

Abstract

Metrics

Citation History

Topics

Related Documents

Speech Emotion Recognition Using Spectrogram & Phoneme Embedding

Speech Emotion Recognition Using MELBP Variants of Spectrogram Image

A Novel Approach to Phoneme Recognition using Speech Image

SPEAKNet: Spectrogram-Phoneme Embedding Architecture for Knowledge-enhanced Speech Command Recognition

Speech and phoneme segmentation under noisy environment through spectrogram image analysis