Feature Extraction Techniques for Deep Learning based Speech Classification

Shreya Chakravarty; Richa R. Khandelwal; Kanchan M. Dhote

doi:10.1109/icccnt56998.2023.10307237

ScienceGate Book Chapters

JOURNAL ARTICLE

Feature Extraction Techniques for Deep Learning based Speech Classification

Shreya Chakravarty Richa R. Khandelwal Kanchan M. Dhote

Year: 2023 Pages: 1-6

DOI: 10.1109/icccnt56998.2023.10307237

Get Full-Text PDF Get Analytical Report

Abstract

The process of classifying audio data into several classes or categories is referred to as audio classification. The purpose of speaker recognition, one particular use of audio classification, is to recognize a person based on the characteristics of their speech. The phrase "voice recognition" refers to both speaker and speech recognition tasks. Speaker verification systems have grown significantly in popularity recently for a variety of uses, such as security measures and individualized help. Computers that have been taught to recognize individual voices can swiftly translate speech or confirm a speaker’s identification as part of a security procedure by identifying the speaker. Four decades of research have gone into speaker recognition, which is based on the acoustic characteristics of speech that differ from person to person. Some systems use auditory input from those seeking entry, just like fingerprint sensors match input fingerprint markings with a database or photographic attendance systems map inputs to a database. Personal assistants, like Google Home, for example, are made to limit access to those who have been given permission. Even under difficult circumstances, these systems must correctly identify or recognize the speaker. This research proposes a strong deep learning-based speaker recognition solution for audio categorization. We suggest self-augmenting the data utilizing four key noise aberration strategies to improve the system’s performance. Additionally, we conduct a comparison study to examine the efficacy of several audio feature extractors. The objective is to create a speaker identification system that is extremely accurate and can be applied in practical situations.

Keywords:

Computer science Speech recognition Speaker recognition Audio mining Speaker diarisation Categorization Feature (linguistics) Identification (biology) Feature extraction Process (computing) Fingerprint (computing) Artificial intelligence Speech processing Acoustic model

Metrics

Cited By

0.81

FWCI (Field Weighted Citation Impact)

Refs

0.68

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Feature Extraction Techniques for Deep Learning based Speech Classification

Abstract

Metrics

Citation History

Topics

Related Documents

Genre Classification using Feature Extraction and Deep Learning Techniques

SPEECH/MUSIC CLASSIFICATION USING WAVELET BASED FEATURE EXTRACTION TECHNIQUES

Speech feature extraction and emotion recognition using deep learning techniques

Deep Learning based Feature Extraction for Texture Classification

Deep Learning-Based Feature Extraction for Speech Emotion Recognition