One application of deep learning in medical applications is the use of deep neural networks to classify human speech as healthy or pathological. In such applications, the audio signal is transformed into a spectrogram that captures its time-varying content and the latter "images" are fed into a classifier for classification. A challenge in applying this approach is the shortage of suitable speech data for training purposes. Labelled data acquisition requires significant human effort and/or time-consuming experiments. In this paper, we propose a semi-supervised learning approach that employs a Generative Adversarial Network (GAN) to alleviate the problem of insufficient training data. We compare the classification performance of a traditional classifier and our semi-supervised classifier. We observe that the GAN-based semi-supervised approach demonstrates a significant improvement in terms of accuracy and ROC curve when supplied an equivalent number of training samples.
Toutouh, JamalNalluru, SubhashHemberg, ErikO'Reilly, Una-May
Jamal ToutouhSubhash NalluruErik HembergUna-May O’Reilly
Jamal ToutouhSubhash NalluruErik HembergUna-May O’Reilly