This paper introduces an automated system designed to recognize specifically speech in the Amazigh language, using a sophisticated deep learning model based on a convolutional neural network (CNN) and features extracted from spectrograms. The research focuses on identifying 18 specific isolated words from a dataset of 2,000 audio files collected from native Amazigh speakers in Morocco's Rif region. To accurately represent the speech signal, our system employs spectrograms that plot time on the x-axis and frequency on the y-axis, while indicating the amplitude through the intensity value at a specific position in the spectrogram. For our system architecture, spectrograms act as input to the deep CNNs. We use 1D convolutional neural network structures consisting of eight layers, primarily used for feature learning and recognition. The model extracts discriminative features from spectrogram images and outputs predictions for the eighteen classes. The findings illustrate that the proposed Convolutional Neural Network achieves an impressive accuracy of 94.77%. This highlights the effectiveness of this approach for automatic speech recognition in the Amazigh language, specifically for single-word recognition.
Fatima BarkaniHassan SatoriMohamed HamidiOuissam ZealoukNaouar Laaidi
Ouissam ZealoukHassan SatoriMohamed HamidiKhalid Satori
Hossam BoulalMohamed HamidiJamal BarkaniMustapha Abarkan
Ilham AddarraziHassan SatoriKhalid Satori