JOURNAL ARTICLE

Polyphonic Sound Event Detection by Using Capsule Neural Networks

Fabio VesperiniLeonardo GabrielliEmanuele PrincipiStefano Squartini

Year: 2019 Journal:   IEEE Journal of Selected Topics in Signal Processing Vol: 13 (2)Pages: 310-322   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Artificial sound event detection (SED) has the aim to mimic the human ability\nto perceive and understand what is happening in the surroundings. Nowadays,\nDeep Learning offers valuable techniques for this goal such as Convolutional\nNeural Networks (CNNs). The Capsule Neural Network (CapsNet) architecture has\nbeen recently introduced in the image processing field with the intent to\novercome some of the known limitations of CNNs, specifically regarding the\nscarce robustness to affine transformations (i.e., perspective, size,\norientation) and the detection of overlapped images. This motivated the authors\nto employ CapsNets to deal with the polyphonic-SED task, in which multiple\nsound events occur simultaneously. Specifically, we propose to exploit the\ncapsule units to represent a set of distinctive properties for each individual\nsound event. Capsule units are connected through a so-called "dynamic routing"\nthat encourages learning part-whole relationships and improves the detection\nperformance in a polyphonic context. This paper reports extensive evaluations\ncarried out on three publicly available datasets, showing how the CapsNet-based\nalgorithm not only outperforms standard CNNs but also allows to achieve the\nbest results with respect to the state of the art algorithms.\n

Keywords:
Computer science Convolutional neural network Artificial intelligence Robustness (evolution) Deep learning Event (particle physics) Object detection Pattern recognition (psychology) Artificial neural network Context (archaeology) Polyphony Set (abstract data type) Affine transformation Machine learning Speech recognition

Metrics

59
Cited By
6.40
FWCI (Field Weighted Citation Impact)
59
Refs
0.97
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Neuroscience and Music Perception
Life Sciences →  Neuroscience →  Cognitive Neuroscience
© 2026 ScienceGate Book Chapters — All rights reserved.