JOURNAL ARTICLE

Weakly Labeled Sound Event Detection using Attention Mechanism with Teacher-Student Model

Abstract

Sound Event Detection (SED) enables identifying and categorizing sound events within audio signals. In this study, we investigate the role of the self and multi-head attention mechanism in enhancing SED performance with the teacher-student learning model in the knowledge distillation context. The study contributes to developing SED methodologies by focusing on detecting events and their temporal boundaries (onset and offset) in weakly labeled and unlabeled sounds. The attention mechanism allows the model to focus on different parts of the audio sequence based on the context, making it robust for tasks where temporal relationships and context matter. Specifically, we present extensive experiments using low- and high-level audio features, including Mel Frequency Cepstral Coefficients (MFCC), Log Mel-Spectrogram (log-Mel), Bidirectional Encoder representation from Audio Transformers (BEATs), Audio Spectrogram Transformer (AST), and Pretrained Audio Neural Networks (PANNs) to assess the performance of individual features with different attention mechanisms. We evaluate the attention mechanism and low-and high-level feature performances with the baseline teacher-student model of the Sound Event Detection with Weak Labels and Synthetic Soundscapes Challenge. Our experiments on the performance dataset show that the proposed attention-based model improves the F1 scores in all features.

Keywords:
Mechanism (biology) Computer science Event (particle physics) Sound (geography) Human–computer interaction Acoustics Physics

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
33
Refs
0.24
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.