A Transpose-SELDNet for Polyphonic Sound Event Localization and Detection

Spoorthy Venkatesh; Shashidhar G. Koolagudi

doi:10.1109/i2ct57861.2023.10126251

ScienceGate Book Chapters

JOURNAL ARTICLE

A Transpose-SELDNet for Polyphonic Sound Event Localization and Detection

Spoorthy Venkatesh Shashidhar G. Koolagudi

Year: 2023 Pages: 1-6

DOI: 10.1109/i2ct57861.2023.10126251

Get Full-Text PDF Get Analytical Report

Abstract

Human beings have the ability to identify a particular event occurring in a surrounding based on sound cues even when no visual scenes are presented. Sound events are the auditory cues that are present in a surrounding. Sound event detection (SED) is the process of determining the beginning and end of sound events as well as a textual label for the event. The term sound source localization (SSL) refers to the process of identifying the spatial location of a sound occurrence in addition to the SED. The integrated task of SED and SSL is known as Sound Event Localization and Detection (SELD). In this proposed work, three different deep learning architectures are explored to perform SELD. The three deep learning architectures are SELDNet, D-SELDNet (Depthwise Convolution), and T-SELDNet (Transpose Convolution). Two sets of features are used to perform SED and Direction-of-Arrival (DOA) estimation tasks in this work. D-SELDNet uses a Depthwise convolution layer which helps reduce the model's complexity in terms of computation time. T-SELDNet uses Transpose Convolution, which helps in learning better discriminative features by retaining the input size and not losing necessary information from the input. The proposed method is evaluated on the First-order Ambisonic (FOA) array format of the TAU-NIGENS Spatial Sound Events 2020 dataset. An improvement has been observed as compared to the existing SELD systems with the proposed T-SELDNet.

Keywords:

Computer science Discriminative model Event (particle physics) Convolution (computer science) Transpose Speech recognition Artificial intelligence Sound (geography) Pattern recognition (psychology) Acoustics Artificial neural network

Metrics

Cited By

0.81

FWCI (Field Weighted Citation Impact)

Refs

0.65

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Hearing Loss and Rehabilitation

Life Sciences → Neuroscience → Cognitive Neuroscience

A Transpose-SELDNet for Polyphonic Sound Event Localization and Detection

Abstract

Metrics

Citation History

Topics

Related Documents

A parametric survey on polyphonic sound event detection and localization

Polyphonic sound event localization and detection using channel-wise FusionNet

An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection

U Recurrent Neural Network for Polyphonic Sound Event Detection and Localization

A Sequence Matching Network for Polyphonic Sound Event Localization and Detection