The spatial perception of a sound image is significantly influenced by the degree of correlation between the sounds received by the ears. Audio signal decorrelation is, therefore, a commonly used tool in various spatial audio processing applications. In this paper, we propose a novel approach to audio decorrelation using generative adversarial networks. As generator, we employ a convolutional neural network architecture that has been recently proposed for audio decorrelation. In contrast to the previous work, the loss function is defined directly w.r.t. the input audio signal, i.e., a decorrelated reference signal is not required. The training objective includes a number of individual loss terms to control both the output-input correlation and the output signal quality. This enables specifically tailoring the training procedure to the desired output signal properties and possibly outperforming conventional decorrelation techniques in terms of performance and flexibility. The proposed approach is compared to a state-of-the-art conventional decorrelation method by means of objective evaluations as well as through listening tests, considering a variety of signal types.
Carlotta AnemüllerOliver ThiergartEmanuël A. P. Habets
Hector Osuna MedranoMarcos Alberto Moroyoqui OlanDavid Espina LópezUlises Orozco-RosasKenia Picos
G Bharathi MohanR Prasanna KumarBhumaraju Mani Teja
Javier NistalStefan LattnerGael Richard
Salaar KhanSyed MeharullahAli Faisal Murtaza