Enhancing text-audio generation by music classification and Retrieval-Augmented Generation

Runyu He; Junyi Zhu; Bingying Wang Bingying Wang; Yixuan Yin

doi:10.54254/2755-2721/68/20241505

ScienceGate Book Chapters

JOURNAL ARTICLE

Enhancing text-audio generation by music classification and Retrieval-Augmented Generation

Runyu He Junyi Zhu Bingying Wang Bingying Wang Yixuan Yin

Year: 2024 Journal: Applied and Computational Engineering Vol: 68 (1)Pages: 319-329

DOI: 10.54254/2755-2721/68/20241505

Get Full-Text PDF Get Analytical Report

Abstract

Recent advancements in deep learning have propelled the development of AI systems capable of generating music that resonates with human emotions and preferences. However, current music generation models still struggle to align generated music with detailed textual descriptions and maintain consistency, especially for longer compositions. This paper presents an innovative approach to address these challenges by integrating genre classification and retrieval-augmented generation (RAG) into the music generation pipeline. We train advanced CNN architectures, including ResNet-50, GoogleNet, and VGG16, for accurate genre classification. The classifier is then incorporated into a RAG framework, where the most relevant pre-classified music piece is retrieved based on the input text query. The retrieved audio and the text description are then fed into the MUSICGEN model to generate a new music piece that inherits attributes from both inputs. We evaluate our system through a double-blind human study, comparing the outputs of the original MUSICGEN model with our RAG-enhanced model. The results demonstrate a significant improvement in the ability of the RAG-enhanced model to generate music embodying specific stylistic elements, as evidenced by higher average confidence scores from participants. Our work represents a significant step towards more personalized and context-aware AI-generated musical experiences, laying the foundation for future advancements in this exciting field.

Keywords:

Computer science Music information retrieval Consistency (knowledge bases) Pipeline (software) Classifier (UML) Deep learning Context (archaeology) Artificial intelligence Natural language processing Musical Speech recognition Field (mathematics)

Metrics

Cited By

1.43

FWCI (Field Weighted Citation Impact)

Refs

0.71

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music Technology and Sound Studies

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Neuroscience and Music Perception

Life Sciences → Neuroscience → Cognitive Neuroscience

Enhancing text-audio generation by music classification and Retrieval-Augmented Generation

Abstract

Metrics

Citation History

Topics

Related Documents

Retrieval-Augmented Text-to-Audio Generation

Enhancing Retrieval Augmented Generation with Hierarchical Text Segmentation Chunking

Enhancing Retrieval-Augmented Generation Systems by Text-Representing Centroid

Enhancing Retrieval-Augmented Generation for Text Completion Through Query Selection

Enhancing Learning with Retrieval-Augmented Generation