Environmental sound classification (ESC) is an important issue. However, due to the lack of datasets, high-accuracy ESC has always been challenging. In this paper, we propose a new convolutional neural network (CNN) model using transfer learning technology for ESC task. First, we represent sound as RGB image, where the red channel corresponds to the Log-Mel spectrogram, the green channel corresponds to the scalogram, and the blue channel corresponds to the Mel frequency cepstrum coefficient (MFCC). Second, we train a CNN architecture based on Xception model which has a better performance on the JFT dataset. Test results show that the proposed approach is with a better performance on the ESC accuracy.
Jingyang ZhouJianrui LuRuisong WangRuofei MaZhiliang Qin
Zhichao ZhangShugong XuShan CaoShunqing Zhang
Sanjiban Sekhar RoySanda Florentina MihalacheEmil PricopNishant Rodrigues
Dharma RanePushkar ShirodkarTrilochan PanigrahiS. Mini