Improved Zero-Shot Audio Tagging &amp; Classification with Patchout Spectrogram Transformers

Paul Primus; Gerhard Widmer

doi:10.23919/eusipco55093.2022.9909760

ScienceGate Book Chapters

JOURNAL ARTICLE

Improved Zero-Shot Audio Tagging & Classification with Patchout Spectrogram Transformers

Paul Primus Gerhard Widmer

Year: 2022 Journal: 2022 30th European Signal Processing Conference (EUSIPCO) Pages: 410-413

DOI: 10.23919/eusipco55093.2022.9909760

Get Full-Text PDF Get Analytical Report

Abstract

Standard machine learning models for tagging and classifying acoustic signals cannot handle classes that were not seen during training. Zero-Shot (ZS) learning overcomes this restriction by predicting classes based on adaptable class descriptions. This study sets out to investigate the effectiveness of self-attention-based audio embedding architectures for ZS learning. To this end, we compare the very recent patchout spectrogram transformer with two classic convolutional architectures. We evaluate these three architectures on three tasks and on three different benchmark datasets: general-purpose tagging on AudioSet, environmental sound classification on ESC-50, and instrument tagging on OpenMIC. Our results show that the self-attention-based embedding methods outperform both compared convolutional architectures in all of these settings. By designing training and test data accordingly, we observe that prediction performance suffers significantly when the 'semantic distance' between training and new test classes is large, an effect that will deserve more detailed investigations.

Keywords:

Spectrogram Computer science Transformer Speech recognition Shot (pellet) Artificial intelligence Pattern recognition (psychology) Engineering Materials science Electrical engineering

Metrics

Cited By

0.70

FWCI (Field Weighted Citation Impact)

Refs

0.68

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Improved Zero-Shot Audio Tagging & Classification with Patchout Spectrogram Transformers

Abstract

Metrics

Citation History

Topics

Related Documents

Spectrogram Transformers for Audio Classification

Efficient Training of Audio Transformers with Patchout

Zero-shot Learning for Audio-based Music Classification and Tagging

Zero-shot Learning for Audio-based Music Classification and Tagging

MAST: Multiscale Audio Spectrogram Transformers