JOURNAL ARTICLE

Sound Event Detection Via Dilated Convolutional Recurrent Neural Networks

Abstract

Convolutional recurrent neural networks (CRNNs) have achieved state-of-the-art performance for sound event detection (SED). In this paper, we propose to use a dilated CRNN, namely a CRNN with a dilated convolutional kernel, as the classifier for the task of SED. We investigate the effectiveness of dilation operations which provide a CRNN with expanded receptive fields to capture long temporal context without increasing the amount of CRNN's parameters. Compared to the classifier of the baseline CRNN, the classifier of the dilated CRNN obtains a maximum increase of 1.9%, 6.3% and 2.5% at F1 score and a maximum decrease of 1.7%, 4.1% and 3.9% at error rate (ER), on the publicly available audio corpora of the TUTSED Synthetic 2016, the TUT Sound Event 2016 and the TUT Sound Event 2017, respectively.

Keywords:
Computer science Recurrent neural network Classifier (UML) Convolutional neural network Speech recognition Word error rate Artificial intelligence Kernel (algebra) Pattern recognition (psychology) Artificial neural network Mathematics

Metrics

45
Cited By
3.99
FWCI (Field Weighted Citation Impact)
29
Refs
0.95
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Music Technology and Sound Studies
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.