Spectrogram Transformers for Audio Classification

Yixiao Zhang; Baihua Li; Hui Fang; Qinggang Meng

doi:10.1109/ist55454.2022.9827729

ScienceGate Book Chapters

JOURNAL ARTICLE

Spectrogram Transformers for Audio Classification

Yixiao Zhang Baihua Li Hui Fang Qinggang Meng

Year: 2022 Pages: 1-6

DOI: 10.1109/ist55454.2022.9827729

Get Full-Text PDF Get Analytical Report

Abstract

Audio classification is an important task in the machine learning field with a wide range of applications. Since the last decade, deep learning based methods have been widely used and the transformer-based models are becoming new paradigm for audio classification. In this paper, we present Spectrogram Transformers, which are a group of transformer-based models for audio classification. Based on the fundamental semantics of audio spectrogram, we design two mechanisms to extract temporal and frequency features from audio spectrogram, named time-dimension sampling and frequency-dimension sampling. These discriminative representations are then enhanced by various combinations of attention block architectures, including Tempo-ral Only (TO) attention, Temporal-Frequency sequential (TFS) attention, Temporal-Frequency Parallel (TFP) attention, and Two-stream Temporal-Frequency (TSTF) attention, to extract the sound record signatures to serve the classification task. Our experiments demonstrate that these Transformer models outper-form the state-of-the-art methods on ESC-50 dataset without pre-training stage. Furthermore, our method also shows great efficiency compared with other leading methods.

Keywords:

Spectrogram Computer science Discriminative model Transformer Speech recognition Artificial intelligence Pattern recognition (psychology) Engineering

Metrics

Cited By

4.29

FWCI (Field Weighted Citation Impact)

Refs

0.93

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music Technology and Sound Studies

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Spectrogram Transformers for Audio Classification

Abstract

Metrics

Citation History

Topics

Related Documents

MAST: Multiscale Audio Spectrogram Transformers

Vocal Biomarkers for Parkinson’s Disease Classification Using Audio Spectrogram Transformers

Improved Zero-Shot Audio Tagging & Classification with Patchout Spectrogram Transformers

Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers

Multiscale Audio Spectrogram Transformer for Efficient Audio Classification