JOURNAL ARTICLE

Polyphonic Sound Event Detection with Weak Labeling

Wang, Yun

Year: 2023 Journal:   OPAL (Open@LaTrobe) (La Trobe University)   Publisher: La Trobe University

Abstract

Sound event detection (SED) is the task of detecting the type as well as the onset and offset times of sound events in audio streams. It is useful for multimedia retrieval, surveillance, etc. SED is difficult because sound events exhibit diverse temporal and spectral characteristics, and because they can overlap with each other. Ideally, SED systems should be trained with strong labeling, which provides the type, onset time and offset time of each sound event occurrence. However, such labeling is formidably tedious to produce by hand. Current research on SED often uses weak labeling. This thesis deals with two types of weak labeling: presence/absence labeling, which only states which types of events are present in each recording without any temporal information, and sequential labeling, which only provides the order of sound events, but without timestamps. Even if the training data is weakly labeled, we still want our SED systems to localize the sound events in time. SED with presence/absence labeling is usually treated as a multiple instance learning (MIL) problem, which requires a pooling function. In this thesis, we compare six pooling functions both theoretically and empirically, and establish the linear softmax pooling function as the optimal. Using this function, we build a state-of-the-art network that not only recognizes the types of sound events, but also localizes them temporally. SED with sequential labeling has not received much attention. In this thesis, we propose a novel connectionist temporal localization (CTL) framework, which successfully makes use of the extra temporal information in sequential labeling compared with presence/absence labeling. Transfer learning is a popular technique to deal with insufficient training data. In this thesis we extract features from two neural networks trained for out-of-domain tasks, and show that these features can improve the SED performance when the training corpus is small.

Keywords:
Pooling Offset (computer science) Softmax function Connectionism Loudness Event (particle physics) Pattern recognition (psychology) Sound recording and reproduction

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.63
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Plant Pathogens and Resistance
Life Sciences →  Agricultural and Biological Sciences →  Plant Science
Potato Plant Research
Life Sciences →  Agricultural and Biological Sciences →  Food Science
Genetic diversity and population structure
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Genetics
© 2026 ScienceGate Book Chapters — All rights reserved.