JOURNAL ARTICLE

Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification

Abstract

Respiratory sound contains crucial information for the early diagnosis of fatal lung diseases.Since the COVID-19 pandemic, there has been a growing interest in contact-free medical care based on electronic stethoscopes.To this end, cutting-edge deep learning models have been developed to diagnose lung diseases; however, it is still challenging due to the scarcity of medical data.In this study, we demonstrate that the pretrained model on large-scale visual and audio datasets can be generalized to the respiratory sound classification task.In addition, we introduce a straightforward Patch-Mix augmentation, which randomly mixes patches between different samples, with Audio Spectrogram Transformer (AST).We further propose a novel and effective Patch-Mix Contrastive Learning to distinguish the mixed representations in the latent space.Our method achieves state-of-the-art performance on the ICBHI dataset, outperforming the prior leading score by an improvement of 4.08%.

Keywords:
Spectrogram Computer science Speech recognition Sound (geography) Transformer Acoustics Engineering Electrical engineering

Metrics

46
Cited By
15.52
FWCI (Field Weighted Citation Impact)
35
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Phonocardiography and Auscultation Techniques
Health Sciences →  Medicine →  Pulmonary and Respiratory Medicine
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.