JOURNAL ARTICLE

Speech Enhancement for Multimodal Speaker Diarization System

Rehan AhmadSyed ZubairHani Alquhayz

Year: 2020 Journal:   IEEE Access Vol: 8 Pages: 126671-126680   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Speaker diarization system identifies the speaker homogenous regions in those set of recordings where multiple speakers are present. It answers the question `who spoke when?'. The data set for speaker diarization usually consists of telephone, meetings, TV/ talk shows, broadcast news and other multi-speaker recordings. In this paper, we present the performance of our proposed multimodal speaker diarization system under noisy conditions. Two types of noises comprising additive white Gaussian noise (AWGN) and realistic environmental noise is used to evaluate the system. To mitigate the effect of noise, we propose to add an LSTM based speech enhancement block in our diarization pipeline. This block is trained on synthesized data set with more than 100 noise types to enhance the noisy speech. The enhanced speech is further used in multimodal speaker diarization system which utilizes a pre-trained audio-visual synchronization model to find the active speaker. High confidence active speaker segments are then used to train the speaker specific clusters on the enhanced speech. A subset of AMI corpus consisting of 5.4 h of recordings is used in this analysis. For AWGN, the LSTM model performance improvement is comparable with Wiener filter while in case of realistic environmental noise, the LSTM model improves significantly as compared to Wiener filter in terms of diarization error rate (DER).

Keywords:
Speaker diarisation Computer science Speech recognition Noise (video) Additive white Gaussian noise Speaker recognition Wiener filter Speech enhancement Voice activity detection Speech processing Artificial intelligence White noise Noise reduction Telecommunications

Metrics

13
Cited By
1.33
FWCI (Field Weighted Citation Impact)
68
Refs
0.81
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing

Related Documents

JOURNAL ARTICLE

Speech Enhancement for Multimodal Speaker Diarization System

Rehan AhmadSyed ZubairHani Alquhayz

Journal:   Greater South Information System Year: 2020
JOURNAL ARTICLE

Speech Enhancement for Multimodal Speaker Diarization System

Rehan AhmadSyed ZubairHani Alquhayz

Journal:   Greater South Information System Year: 2020
JOURNAL ARTICLE

Multimodal Speaker Diarization

A. NoulasGwenn EnglebienneBen Kröse

Journal:   IEEE Transactions on Pattern Analysis and Machine Intelligence Year: 2011 Vol: 34 (1)Pages: 79-93
JOURNAL ARTICLE

Optimized Deep Embedded Clustering-Based Speaker Diarization with Speech Enhancement

S. RevathyS. S. Kumar

Journal:   Circuits Systems and Signal Processing Year: 2025
© 2026 ScienceGate Book Chapters — All rights reserved.