uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures

Afrina Tabassum; Dung Tran; Trung Dang; Ismini Lourentzou; Kazuhito Koishida

doi:10.1109/icassp48485.2024.10446342

ScienceGate Book Chapters

JOURNAL ARTICLE

uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures

Afrina Tabassum Dung Tran Trung Dang Ismini Lourentzou Kazuhito Koishida

Year: 2024 Pages: 5435-5439

DOI: 10.1109/icassp48485.2024.10446342

Get Full-Text PDF Get Analytical Report

Abstract

Masked Autoencoders (MAEs) learn rich low-level representations from unlabeled data but require substantial labeled data to effectively adapt to downstream tasks. Conversely, Instance Discrimination (ID) emphasizes high-level semantics, offering a potential solution to alleviate annotation requirements in MAEs. Although combining these two approaches can address downstream tasks with limited labeled data, naively integrating ID into MAEs leads to extended training times and high computational costs. To address this challenge, we introduce uaMix-MAE, an efficient ID tuning strategy that leverages unsupervised audio mixtures. Utilizing contrastive tuning, uaMix-MAE aligns the representations of pretrained MAEs, thereby facilitating effective adaptation to task-specific semantics. To optimize the model with small amounts of unlabeled data, we propose an audio mixing technique that manipulates audio samples in both input and virtual label spaces. Experiments in low/few-shot settings demonstrate that uaMix-MAE achieves 4 − 6% accuracy improvements over various benchmarks when tuned with limited unlabeled data, such as AudioSet-20K.

Keywords:

Computer science Transformer Artificial intelligence Annotation Labeled data Semantics (computer science) Unsupervised learning Speech recognition Machine learning

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.03

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures

Abstract

Metrics

Topics

Related Documents

Efficient Training of Audio Transformers with Patchout

Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters

Audio Transformers

Tuning Renesas Audio Products with Audio Tuning Tool (ATT)

PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition (Pretrained Models)