Fine-tuning Audio Spectrogram Transformer with Task-aware Adapters for Sound Event Detection

Shuo Li; Yan Song; Ian McLoughlin; Lin Liu; Jin Li; Li-Rong Dai

doi:10.21437/interspeech.2023-1174

ScienceGate Book Chapters

JOURNAL ARTICLE

Fine-tuning Audio Spectrogram Transformer with Task-aware Adapters for Sound Event Detection

Shuo Li Yan Song Ian McLoughlin Lin Liu Jin Li Li-Rong Dai

Year: 2023 Pages: 291-295

DOI: 10.21437/interspeech.2023-1174

Get Full-Text PDF Get Analytical Report

Abstract

<p dir="ltr">In this paper, we present a task-aware fine-tuning method to transfer Patchout faSt Spectrogram Transformer (PaSST) model to sound event detection (SED) task. Pretrained PaSST has shown significant performance on audio tagging (AT) and SED tasks, but it is not optimal to fine-tune the model from a single layer as the local and semantic information have not been well exploited. To address this, we first introduce task-aware adapters including SED-adapter and AT-adapter to fine-tune PaSST for SED and AT task respectively, and then propose task-aware fine-tuning to combine local information from shallower layer with semantic information from deeper layer, based on task-aware adapters. Besides, we propose the self-distillated mean teacher (SdMT) to train a robust student model with soft pseudo labels from teacher. Experiments are conducted on DCASE2022 task4 development set, the EB-F1 of 64.85% and PSDS1 of 0.5548 are achieved which outperform previous state-of-the-art systems.</p>

Keywords:

Spectrogram Computer science Transformer Speech recognition Audio signal processing Task (project management) Audio signal Engineering Electrical engineering Speech coding Voltage

Metrics

Cited By

1.61

FWCI (Field Weighted Citation Impact)

Refs

0.81

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Neural Networks and Applications

Physical Sciences → Computer Science → Artificial Intelligence

Fine-tuning Audio Spectrogram Transformer with Task-aware Adapters for Sound Event Detection

Abstract

Metrics

Citation History

Topics

Related Documents

A Sequential Audio Spectrogram Transformer for Real-Time Sound Event Detection

Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters

AST-SED: An Effective Sound Event Detection Method Based on Audio Spectrogram Transformer

Abnormal Respiratory Sound Identification Using Audio-Spectrogram Vision Transformer

AST: Audio Spectrogram Transformer