Exploring Neural Audio Codec-based Contrastive Language-Audio Pre-training

Huang, Du

doi:10.5281/zenodo.13862188

ScienceGate Book Chapters

JOURNAL ARTICLE

Exploring Neural Audio Codec-based Contrastive Language-Audio Pre-training

Huang, Du

Year: 2024 Journal: Zenodo (CERN European Organization for Nuclear Research) Publisher: European Organization for Nuclear Research

DOI: 10.5281/zenodo.13862188

Get Full-Text PDF Get Analytical Report

Abstract

This thesis investigates the potential of Neural Audio Codecs (NACs) to enrich the audio representation capabilities of Contrastive Language-Audio Pretraining (CLAP) models. We introduce an innovative evaluation approach to systematically compare various CLAP configurations utilizing distinct audio encoder modules on the text-to-audio retrieval task. Our rigorous experimental analysis implies that NAC-based modules offer superior feature discrimination and retrieval eficacy. The research presents a methodological framework for NAC integration in CLAP models, sets new performance benchmarks, and outlines future directions, emphasizing the development of universal audio embeddings and refined pre-training techniques.Our codes are available at https://github.com/duduOliver/SMC_CodecCLAP.

Keywords:

Encoder Feature (linguistics) Representation (politics) Codec Audio analyzer Encoding (memory)

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.62

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Medical History and Research

Social Sciences → Arts and Humanities → History

Williams Syndrome Research

Life Sciences → Neuroscience → Developmental Neuroscience

Medical and Biological Sciences

Health Sciences → Medicine → Anatomy

Exploring Neural Audio Codec-based Contrastive Language-Audio Pre-training

Abstract

Metrics

Topics

Related Documents

Exploring Neural Audio Codec-based Contrastive Language-Audio Pre-training

Neural Audio Codec

Codec-ASV: Exploring Neural Audio Codec For Speaker Representation Learning

Speaker Anonymization Using Neural Audio Codec Language Models

Generating Sample-Based Musical Instruments Using Neural Audio Codec Language Models