Yao-San LinHung-Yu ChenMei‐Ling HuangTsung-Yu Hsieh
Voiceprint recognition systems often face challenges related to limited and diverse datasets, which hinder their performance and generalization capabilities. This study proposes a novel approach that integrates generative adversarial networks (GANs) for data augmentation and convolutional neural networks (CNNs) with mel-frequency cepstral coefficients (MFCCs) for voiceprint classification. Experimental results demonstrate that the proposed methodology improves recognition accuracy by up to 15% in low-resource scenarios. The optimal ratio of real-to-GAN-generated samples was determined to be 3:2, which balanced dataset diversity and model performance. In specific cases, the model achieved an accuracy of 96.6%, showcasing its effectiveness in capturing unique voice characteristics while mitigating overfitting. These results highlight the potential of combining GAN-augmented data and CNN-based classification to enhance voiceprint recognition in diverse and resource-constrained environments.
Leipu WangJun SunJingming SunJunpeng Yu
Oleksandr ChaikovskyiArtem VolokytaArtemi KyrianovHeorhii Loutskii
Guangcheng BaoBin YanLi TongJun ShuLinyuan WangKai YangYing Zeng