A Frame Loss of Multiple Instance Learning for Weakly Supervised Sound Event Detection

Xu Wang; Xiangjinzi Zhang; Yunfei Zi; Shengwu Xiong

doi:10.1109/icassp43922.2022.9746435

ScienceGate Book Chapters

JOURNAL ARTICLE

A Frame Loss of Multiple Instance Learning for Weakly Supervised Sound Event Detection

Xu Wang Xiangjinzi Zhang Yunfei Zi Shengwu Xiong

Year: 2022 Journal: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pages: 331-335

DOI: 10.1109/icassp43922.2022.9746435

Get Full-Text PDF Get Analytical Report

Abstract

Sound event detection(SED) consists of two subtasks: predicting the classes of sound events within an audio clip (audio tagging) and indicating the onset and offset times for each event (localization). One of the common approaches for SED with weak label is multiple instance learning (MIL) method. However, the general MIL method only optimizes the global loss calculated from the aggregated clip-wise predictions and weak clip labels, lacking a direct constraint on the frame-wise predictions, which leads to a large number of unreasonable prediction values. To address this issue, we explore the deterministic information that can be used to constrain the framewise predictions and based on which we design a frame loss with two terms. Experimental results on the DCASE2017 Task4 dataset demonstrate that the proposed loss can improve the performance of general MIL method. While this article focuses on SED applications, the proposed methods could be applied widely to MIL problems. Code will be available at WSSED.

Keywords:

Computer science Frame (networking) Offset (computer science) Event (particle physics) Constraint (computer-aided design) Speech recognition Artificial intelligence Code (set theory) Machine learning Pattern recognition (psychology) Mathematics

Metrics

Cited By

0.28

FWCI (Field Weighted Citation Impact)

Refs

0.35

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music Technology and Sound Studies

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

A Frame Loss of Multiple Instance Learning for Weakly Supervised Sound Event Detection

Abstract

Metrics

Citation History

Topics

Related Documents

Contrastive Transformer-Based Multiple Instance Learning for Weakly Supervised Polyp Frame Detection

Discrepant multiple instance learning for weakly supervised object detection

Unbiased Multiple Instance Learning for Weakly Supervised Video Anomaly Detection

Duration Robust Weakly Supervised Sound Event Detection

Guided Learning for Weakly-Labeled Semi-Supervised Sound Event Detection