Different sound events have different time-frequency scale characteristics, which are useful for sound event detection (SED), but not yet effectively exploited. In this paper, we aim to adaptively select multi-scale feature information that is conducive to classification of sound events. We propose a novel module, namely multi-scale residual attention (MSRA), which is composed of multi-scale residual convolutional block and selective multiscale attention block. Multi-scale residual convolution block extracts features at multiple scales, among which selective multiscale attention block adaptively selects the features that are helpful for event classification. Experimental results prove that our method outperforms the state-of-the-art model by 3.7% on Task 4 of the DCASE 2018 Challenge dataset.
Liwei LinXiangdong WangHong LiuYueliang Qian
Koichi MiyazakiTatsuya KomatsuTomoki HayashiShinji WatanabeTomoki TodaKazuya Takeda
Jie YanYan SongLi-Rong DaiIan McLoughlin