Feature extraction is of great importance to ultrasound tongue image analysis. Inspired by the recent success of deep learning, we explore a novel approach to feature extraction from ultrasound tongue images using pre-trained convolutional neural networks (CNN). The bottleneck features from different pre-trained CNNs, including VGGNet and ResNet, are used as representations of the ultrasound tongue images. Then an image classification task is conducted to assess the effectiveness of CNN-based features. Our dataset consists of 20,000 ultrasound tongue images collected from a female speaker of Mandarin Chinese, which were manually labeled as containing one of the following consonants: /p, t, k, l/. Experiment results show that the Gradient Boost Machines (GBM) classifiers trained on the CNN-based features achieve the best performance, with a classification accuracy of 92.4% for ResNet and 91.6% for VGGNet, outperforming the benchmark GBM classifier trained on the features extracted using Principal Component Analysis (PCA), which only achieves an accuracy of 87.5%. In this preliminary dataset, our method of feature extraction is found to be superior to the PCA-based method. This work demonstrates the potential of applying the pre-trained convolutional neural networks to ultrasound tongue image analysis task.
M. B. SrinivasDebaditya RoyC. Krishna Mohan
Yücel CimtayGokce Nur TED UNIVERSITY YILMAZ
Ruthvik VailaJohn ChiassonVishal Saxena