JOURNAL ARTICLE

A Multi-grained based Attention Network for Semi-supervised Sound Event Detection

Abstract

Sound event detection (SED) is an interesting but challenging task due to the\nscarcity of data and diverse sound events in real life. This paper presents a\nmulti-grained based attention network (MGA-Net) for semi-supervised sound event\ndetection. To obtain the feature representations related to sound events, a\nresidual hybrid convolution (RH-Conv) block is designed to boost the vanilla\nconvolution's ability to extract the time-frequency features. Moreover, a\nmulti-grained attention (MGA) module is designed to learn temporal resolution\nfeatures from coarse-level to fine-level. With the MGA module,the network could\ncapture the characteristics of target events with short- or long-duration,\nresulting in more accurately determining the onset and offset of sound events.\nFurthermore, to effectively boost the performance of the Mean Teacher (MT)\nmethod, a spatial shift (SS) module as a data perturbation mechanism is\nintroduced to increase the diversity of data. Experimental results show that\nthe MGA-Net outperforms the published state-of-the-art competitors, achieving\n53.27% and 56.96% event-based macro F1 (EB-F1) score, 0.709 and 0.739\npolyphonic sound detection score (PSDS) on the validation and public set\nrespectively.\n

Keywords:
Computer science Convolution (computer science) Speech recognition Artificial intelligence Offset (computer science) Pattern recognition (psychology) Artificial neural network

Metrics

10
Cited By
1.40
FWCI (Field Weighted Citation Impact)
29
Refs
0.82
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Music Technology and Sound Studies
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.