JOURNAL ARTICLE

Sound Event Detection of Weakly Labelled Data With CNN-Transformer and Automatic Threshold Optimization

Qiuqiang KongYong XuWenwu WangMark D. Plumbley

Year: 2020 Journal:   IEEE/ACM Transactions on Audio Speech and Language Processing Vol: 28 Pages: 2450-2460   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Sound event detection (SED) is a task to detect sound \nevents in an audio recording. One challenge of the SED task \nis that many datasets such as the Detection and Classification of Acoustic Scenes and Events (DCASE) datasets are weakly labelled. That is, there are only audio tags for each audio clip without the onset and offset times of sound events. We compare segment-wise and clip-wise training for SED that is lacking in previous works. We propose a convolutional neural network transformer (CNN-Transfomer) for audio tagging and SED, and show that CNN-Transformer performs similarly to a convolutional recurrent neural network (CRNN). Another challenge of SED is that thresholds are required for detecting \nsound events. Previous works set thresholds empirically, and are not an optimal approaches. To solve this problem, we propose an automatic threshold optimization method. The first stage is to optimize the system with respect to metrics that do not depend on thresholds, such as mean average precision (mAP). The second \nstage is to optimize the thresholds with respect to metrics that depends on those thresholds. Our proposed automatic threshold optimization system achieves a state-of-the-art audio tagging F1 of 0.646, outperforming that without threshold optimization of \n0.629, and a sound event detection F1 of 0.584, outperforming that without threshold optimization of 0.564.

Keywords:
Computer science Convolutional neural network Transformer Pattern recognition (psychology) Speech recognition Artificial intelligence Offset (computer science)

Metrics

121
Cited By
12.56
FWCI (Field Weighted Citation Impact)
75
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Music Technology and Sound Studies
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

DISSERTATION

Sound event detection with weakly labelled data

Qiuqiang Kong

University:   Surrey Research Insight Open Access (The University of Surrey) Year: 2020
JOURNAL ARTICLE

Sound Event Detection and Time–Frequency Segmentation from Weakly Labelled Data

Qiuqiang KongYong XuIwona SobierajWenwu WangMark D. Plumbley

Journal:   IEEE/ACM Transactions on Audio Speech and Language Processing Year: 2019 Vol: 27 (4)Pages: 777-787
JOURNAL ARTICLE

Sound Event Detection: A Wavelet Based Approach For Weakly Labelled Data

Amit KumarVinal Patel

Journal:   2021 IEEE Bombay Section Signature Conference (IBSSC) Year: 2021 Vol: 52 Pages: 1-4
JOURNAL ARTICLE

Weakly labeled sound event detection with a capsule-transformer model

K. L. LiShuguo YangLi ZhaoWenwu Wang

Journal:   Digital Signal Processing Year: 2023 Vol: 146 Pages: 104347-104347
JOURNAL ARTICLE

CNN-Transformer with Self-Attention Network for Sound Event Detection

Keigo WakayamaShoichiro Saito

Journal:   ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Year: 2022
© 2026 ScienceGate Book Chapters — All rights reserved.