SLIDES - Real-Time Multichannel Speech Separation And Enhancement Using A Beamspace-Domain-Based Lightweight CNN

Marco Olivieri; Luca Comanducci; Mirco Pezzoli; Davide Balsarri; Luca Menescardi; Michele Buccoli; Simone Pecorino; Antonio Grosso; Fabio Antonacci; Augusto Sarti

doi:10.60864/etfd-9658

JOURNAL ARTICLE

SLIDES - Real-Time Multichannel Speech Separation And Enhancement Using A Beamspace-Domain-Based Lightweight CNN

Marco Olivieri Luca Comanducci Mirco Pezzoli Davide Balsarri Luca Menescardi Michele Buccoli Simone Pecorino Antonio Grosso Fabio Antonacci Augusto Sarti

Year: 2023 Journal: IEEE SIGPORT

DOI: 10.60864/etfd-9658

Get Full-Text PDF Get Analytical Report

Abstract

The problems of speech separation and enhancement concern the extraction of the speech emitted by a target speaker when placed in a scenario where multiple interfering speakers or noise are present, respectively. A plethora of practical applications such as home assistants and teleconferencing require some sort of speech separation and enhancement pre-processing before applying Automatic Speech Recognition (ASR) systems. In the recent years, most techniques have focused on the application of deep learning to either time-frequency or time-domain representations of the input audio signals. In this paper we propose a real-time multichannel speech separation and enhancement technique, which is based on the combination of a directional representation of the sound field, denoted as beamspace, with a lightweight Convolutional Neural Network (CNN). We consider the case where the Direction-Of-Arrival (DOA) of the target speaker is approximately known, a scenario where the power of the beamspace-based representation can be fully exploited, while we make no assumption regarding the identity of the talker. We present experiments where the model is trained on simulated data and tested on real recordings and we compare the proposed method with a similar state-of-the-art technique.

Keywords:

Speech enhancement Representation (politics) Noise (video) Speaker recognition Convolutional neural network Source separation Voice activity detection Spectrogram Identity (music)

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.57

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Wheat and Barley Genetics and Pathology

Life Sciences → Agricultural and Biological Sciences → Plant Science

Genetic diversity and population structure

Life Sciences → Biochemistry, Genetics and Molecular Biology → Genetics

Berry genetics and cultivation research

Life Sciences → Agricultural and Biological Sciences → Plant Science

SLIDES - Real-Time Multichannel Speech Separation And Enhancement Using A Beamspace-Domain-Based Lightweight CNN

Abstract

Metrics

Topics

Related Documents

Real-Time Multichannel Speech Separation and Enhancement Using a Beamspace-Domain-Based Lightweight CNN

PAPER - Real-Time Multichannel Speech Separation And Enhancement Using A Beamspace-Domain-Based Lightweight CNN

A Neural Beamspace-Domain Filter for Real-Time Multi-Channel Speech Enhancement

Multichannel Speech Enhancement in the Time Domain

Multichannel Speech Enhancement in the Time Domain