This paper proposes a novel biologically inspired method for sound event classification which combines spike coding with a spiking neural network (SNN). Our spike coding extracts keypoints that represent the local maxima components of the sound spectrogram, and are encoded based on their local time-frequency information; hence both location and spectral information are being extracted. We then design a modified tempotron SNN that, unlike the original tempotron, allows the network to learn the temporal distributions of spike coding input, in an analogous way to the generalized Hough transform. The proposed method simultaneously enhances the sparsity of the sound event spectrogram, producing a representation which is robust against noise, as well as maximises the discriminability of the spike coding input in terms of its temporal information, which is important for sound event classification. Experimental results on a large dataset of 50 environment sound events show the superiority of both the spike coding versus the raw spectrogram and the SNN versus conventional cross-entropy neural networks.
Jiawen LiuYifan HuGuoqi LiJing PeiLei Deng
Yi JiangSen LuAbhronil Sengupta
Ian McLoughlinHaomin ZhangZhipeng XieYan SongXiao Wei
Limiao NingJunfei DongRong XiaoKay Chen TanHuajin Tang
Naoya MuramatsuHai-Tao YuTetsuji SATOH