JOURNAL ARTICLE

Causal Diffusion Models for Generalized Speech Enhancement

Richter, JuliusWelker, SimonLemercier, Jean-MarieLay, BunlongPeer, TalGerkmann, Timo

Year: 2024 Journal:   DESY Publication Database (PUBDB) (Deutsches Elektronen-Synchrotron)   Publisher: Deutsches Elektronen-Synchrotron DESY

Abstract

In this work, we present a causal speech enhancement system that is designed to handledifferent types of corruptions. This paper is an extended version of our contribution to the “ICASSP 2023Speech Signal Improvement Challenge”. The method is based on a generative diffusion model which hasbeen shown to work well in scenarios beyond speech-in-noise, such as missing data and non-additivecorruptions. We guarantee causal processing with an algorithmic latency of 20 ms by modifying the networkarchitecture and removing non-causal normalization techniques. To train and test our model, we generate anew corrupted speech dataset which includes additive background noise, reverberation, clipping, packet loss,bandwidth reduction, and codec artifacts. We compare the causal and non-causal versions of our method toinvestigate the impact of causal processing and we assess the gap between specialized models trained on aparticular corruption type and the generalized model trained on all corruptions. Although specialized modelsand non-causal models have a small advantage, we show that the generalized causal approach does not sufferfrom a significant performance penalty, while it can be flexibly employed for real-world applications wheredifferent types of distortions may occur.

Keywords:
Causal model Normalization (sociology) Speech processing Speech enhancement Signal processing Generative model Causal inference Speech coding

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.47
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Face recognition and analysis
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Causal Diffusion Models for Generalized Speech Enhancement

Julius RichterSimon WelkerJean-Marie LemercierBunlong LayTal PeerTimo Gerkmann

Journal:   IEEE Open Journal of Signal Processing Year: 2024 Vol: 5 Pages: 780-789
JOURNAL ARTICLE

Speech Enhancement with Generative Diffusion Models

O. V. GirfanovА. Г. Шишкин

Journal:   Automatic Documentation and Mathematical Linguistics Year: 2023 Vol: 57 (5)Pages: 249-257
JOURNAL ARTICLE

Speech Enhancement and Dereverberation With Diffusion-Based Generative Models

Julius RichterSimon WelkerJean-Marie LemercierBunlong LayTimo Gerkmann

Journal:   IEEE/ACM Transactions on Audio Speech and Language Processing Year: 2023 Vol: 31 Pages: 2351-2364
© 2026 ScienceGate Book Chapters — All rights reserved.