JOURNAL ARTICLE

Specialized Decision Surface and Disentangled Feature for Weakly-Supervised Polyphonic Sound Event Detection

Liwei LinXiangdong WangHong LiuYueliang Qian

Year: 2020 Journal:   IEEE/ACM Transactions on Audio Speech and Language Processing Pages: 1-1   Publisher: Institute of Electrical and Electronics Engineers

Abstract

In this article, a special decision surface for the weakly-supervised sound event detection (SED) and a disentangled feature (DF) for the multi-label problem in polyphonic SED are proposed. We approach SED as a multiple instance learning (MIL) problem and utilize a neural network framework with a pooling module to solve it. General MIL approaches include two kinds: the instance-level approaches and embedding-level approaches. We present a method of generating instance-level probabilities for the embedding level approaches which tend to perform better than the instance-level approaches in terms of bag-level classification but can not provide instance-level probabilities in current approaches. Moreover, we further propose a specialized decision surface (SDS) for the embedding-level attention pooling. We analyze and explained why an embedding-level attention module with SDS is better than other typical pooling modules from the perspective of the high-level feature space. As for the problem of the unbalanced dataset and the co-occurrence of multiple categories in the polyphonic event detection task, we propose a DF to reduce interference among categories, which optimizes the high-level feature space by disentangling it based on class-wise identifiable information and obtaining multiple different subspaces. Experiments on the dataset of DCASE 2018 Task 4 show that the proposed SDS and DF significantly improve the detection performance of the embedding-level MIL approach with an attention pooling module and outperform the first place system in the challenge by $\mathbf {6.6}$ percentage points.

Keywords:
Embedding Pooling Computer science Feature (linguistics) Pattern recognition (psychology) Artificial intelligence Event (particle physics) Polyphony Feature vector Machine learning

Metrics

28
Cited By
3.10
FWCI (Field Weighted Citation Impact)
53
Refs
0.92
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Towards Duration Robust Weakly Supervised Sound Event Detection

Heinrich DinkelMengyue WuKai Yu

Journal:   IEEE/ACM Transactions on Audio Speech and Language Processing Year: 2021 Vol: 29 Pages: 887-900
JOURNAL ARTICLE

Adaptive Hierarchical Pooling for Weakly-supervised Sound Event Detection

Lijian GaoLing ZhouQirong MaoMing Dong

Journal:   Proceedings of the 30th ACM International Conference on Multimedia Year: 2022 Pages: 1779-1787
© 2026 ScienceGate Book Chapters — All rights reserved.