Audio Source Separation Using Deep Neural Networks

Pritish Chandna

doi:10.5281/zenodo.3755621

ScienceGate Book Chapters

DISSERTATION

Audio Source Separation Using Deep Neural Networks

Pritish Chandna

Year: 2016 University: Zenodo (CERN European Organization for Nuclear Research) Publisher: European Organization for Nuclear Research

DOI: 10.5281/zenodo.3755621

Get Full-Text PDF Get Analytical Report

Abstract

This thesis presents a low latency online source separation algorithm based on convolutional neural networks. Building on ideas from previous research on source separation, we propose an algorithm using a deep neural network with convolutional layers. This type of neural network has resulted in state-of-the-art techniques for several image processing problems. We try to adapt these ideas to the audio domain, focusing on low-latency extraction of 4 tracks (vocals, bass, drums and other instruments) from a single-channel (monaural) musical recording. We try to minimize processing time for the algorithm without compromising on performance through data compression. The Mixing Secrets Dataset 100 (MSD100) and the Demixing Secrets Dataset 100 (DSD100) are used for evaluation of the methodology . The results achieved by the algorithm show a 8.4 dB gain in SDR and a 9 dB gain in SAR for vocals over the state-of-the-art deep learning source separation approach using recurrent neural networks.The system’s performance is comparable with other state-ofthe- art algorithms like non-negative matrix factorization, in terms of separation performance, while improving significantly on processing time. This thesis is a stepping block for further research in this area, particularly for implementation of source separation algorithms for medical purposes like speech enhancement for cochlear implants, a task that requires low-latency.

Keywords:

Artificial neural network Source separation Computer science Separation (statistics) Deep neural networks Speech recognition Artificial intelligence Machine learning

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.30

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Audio Source Separation Using Deep Neural Networks

Abstract

Metrics

Topics

Related Documents

Audio Source Separation Using Deep Neural Networks

Monoaural Audio Source Separation Using Deep Convolutional Neural Networks

Multichannel Audio Source Separation With Deep Neural Networks

Two-Stage Single-Channel Audio Source Separation Using Deep Neural Networks

Audio source separation using supervised deep neural network