JOURNAL ARTICLE

Improving Weakly Supervised Sound Event Detection with Causal Intervention

Abstract

Existing weakly supervised sound event detection (WSSED) work has not explored both types of co-occurrences simultaneously, i.e., some sound events often co-occur, and their occurrences are usually accompanied by specific background sounds, so they would be inevitably entangled, causing misclassification and biased localization results with only clip-level supervision. To tackle this issue, we first establish a structural causal model (SCM) to reveal that the context is the main cause of co-occurrence confounders that mislead the model to learn spurious correlations between frames and clip-level labels. Based on the causal analysis, we propose a causal intervention (CI) method for WSSED to remove the negative impact of co-occurrence confounders by iteratively accumulating every possible context of each class and then re-projecting the contexts to the frame-level features for making the event boundary clearer. Experiments show that our method effectively improves the performance on multiple datasets and can generalize to various baseline models.

Keywords:
Spurious relationship Computer science Context (archaeology) Event (particle physics) Class (philosophy) Confounding Frame (networking) Artificial intelligence Baseline (sea) Machine learning Statistics Mathematics

Metrics

1
Cited By
0.27
FWCI (Field Weighted Citation Impact)
25
Refs
0.41
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Music Technology and Sound Studies
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Towards Duration Robust Weakly Supervised Sound Event Detection

Heinrich DinkelMengyue WuKai Yu

Journal:   IEEE/ACM Transactions on Audio Speech and Language Processing Year: 2021 Vol: 29 Pages: 887-900
JOURNAL ARTICLE

Adaptive Hierarchical Pooling for Weakly-supervised Sound Event Detection

Lijian GaoLing ZhouQirong MaoMing Dong

Journal:   Proceedings of the 30th ACM International Conference on Multimedia Year: 2022 Pages: 1779-1787
© 2026 ScienceGate Book Chapters — All rights reserved.