Polyphonic Sound Event Detection by Using Capsule Neural Networks

Fabio Vesperini; Leonardo Gabrielli; Emanuele Principi; Stefano Squartini

doi:10.1109/jstsp.2019.2902305

ScienceGate Book Chapters

JOURNAL ARTICLE

Polyphonic Sound Event Detection by Using Capsule Neural Networks

Fabio Vesperini Leonardo Gabrielli Emanuele Principi Stefano Squartini

Year: 2019 Journal: IEEE Journal of Selected Topics in Signal Processing Vol: 13 (2)Pages: 310-322 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/jstsp.2019.2902305

Get Full-Text PDF Get Analytical Report

Abstract

Artificial sound event detection (SED) has the aim to mimic the human ability\nto perceive and understand what is happening in the surroundings. Nowadays,\nDeep Learning offers valuable techniques for this goal such as Convolutional\nNeural Networks (CNNs). The Capsule Neural Network (CapsNet) architecture has\nbeen recently introduced in the image processing field with the intent to\novercome some of the known limitations of CNNs, specifically regarding the\nscarce robustness to affine transformations (i.e., perspective, size,\norientation) and the detection of overlapped images. This motivated the authors\nto employ CapsNets to deal with the polyphonic-SED task, in which multiple\nsound events occur simultaneously. Specifically, we propose to exploit the\ncapsule units to represent a set of distinctive properties for each individual\nsound event. Capsule units are connected through a so-called "dynamic routing"\nthat encourages learning part-whole relationships and improves the detection\nperformance in a polyphonic context. This paper reports extensive evaluations\ncarried out on three publicly available datasets, showing how the CapsNet-based\nalgorithm not only outperforms standard CNNs but also allows to achieve the\nbest results with respect to the state of the art algorithms.\n

Keywords:

Computer science Convolutional neural network Artificial intelligence Robustness (evolution) Deep learning Event (particle physics) Object detection Pattern recognition (psychology) Artificial neural network Context (archaeology) Polyphony Set (abstract data type) Affine transformation Machine learning Speech recognition

Metrics

Cited By

6.40

FWCI (Field Weighted Citation Impact)

Refs

0.97

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Neuroscience and Music Perception

Life Sciences → Neuroscience → Cognitive Neuroscience

Polyphonic Sound Event Detection by Using Capsule Neural Networks

Abstract

Metrics

Citation History

Topics

Related Documents

Polyphonic sound event detection using multi label deep neural networks

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

Relational recurrent neural networks for polyphonic sound event detection

Parallel Capsule Neural Networks for Sound Event Detection

Polyphonic Bird Sound Event Detection With Convolutional Recurrent Neural Networks