JOURNAL ARTICLE

Exploring Neural Audio Codec-based Contrastive Language-Audio Pre-training

Huang, Du

Year: 2024 Journal:   Zenodo (CERN European Organization for Nuclear Research)   Publisher: European Organization for Nuclear Research

Abstract

This thesis investigates the potential of Neural Audio Codecs (NACs) to enrich the audio representation capabilities of Contrastive Language-Audio Pretraining (CLAP) models. We introduce an innovative evaluation approach to systematically compare various CLAP configurations utilizing distinct audio encoder modules on the text-to-audio retrieval task. Our rigorous experimental analysis implies that NAC-based modules offer superior feature discrimination and retrieval eficacy. The research presents a methodological framework for NAC integration in CLAP models, sets new performance benchmarks, and outlines future directions, emphasizing the development of universal audio embeddings and refined pre-training techniques.Our codes are available at https://github.com/duduOliver/SMC_CodecCLAP.

Keywords:
Encoder Feature (linguistics) Representation (politics) Codec Audio analyzer Encoding (memory)

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.62
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Medical History and Research
Social Sciences →  Arts and Humanities →  History
Williams Syndrome Research
Life Sciences →  Neuroscience →  Developmental Neuroscience
Medical and Biological Sciences
Health Sciences →  Medicine →  Anatomy
© 2026 ScienceGate Book Chapters — All rights reserved.