JOURNAL ARTICLE

Fine-tuning Audio Spectrogram Transformer with Task-aware Adapters for Sound Event Detection

Abstract

<p dir="ltr">In this paper, we present a task-aware fine-tuning method to transfer Patchout faSt Spectrogram Transformer (PaSST) model to sound event detection (SED) task. Pretrained PaSST has shown significant performance on audio tagging (AT) and SED tasks, but it is not optimal to fine-tune the model from a single layer as the local and semantic information have not been well exploited. To address this, we first introduce task-aware adapters including SED-adapter and AT-adapter to fine-tune PaSST for SED and AT task respectively, and then propose task-aware fine-tuning to combine local information from shallower layer with semantic information from deeper layer, based on task-aware adapters. Besides, we propose the self-distillated mean teacher (SdMT) to train a robust student model with soft pseudo labels from teacher. Experiments are conducted on DCASE2022 task4 development set, the EB-F1 of 64.85% and PSDS1 of 0.5548 are achieved which outperform previous state-of-the-art systems.</p>

Keywords:
Spectrogram Computer science Transformer Speech recognition Audio signal processing Task (project management) Audio signal Engineering Electrical engineering Speech coding Voltage

Metrics

6
Cited By
1.61
FWCI (Field Weighted Citation Impact)
0
Refs
0.81
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Neural Networks and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.