Over the last decade, various detection mechanisms for spoofed speech have been proposed. Thus far the development focus has been on detection accuracy, largely ignoring secondary goals such as computational complexity or storage effort. In this work, we use empirical mode decomposition to compute intrinsic mode functions which are then demodulated to obtain features consisting of short-time statistics of instantaneous amplitude and instantaneous frequency. These features are then used with a simple k-nearest neighbours classifier. We further show that voiced segments from short speech signals can be used in the feature extraction resulting in a spoofing detection competitive with top-performing systems while having up to 103× less computation.
Abhinav SharmaAnshu SharmaTaruna AnandPradeep JunejaAnushka Agarwal
Arsalan Rahman MirzaAbdulbasit K. Al‐Talabani
Raoudha RahmeniAnis Ben AichaYassine Ben Ayed
Dipjyoti PaulMonisankha PalGoutam Saha
DD AriniKN RamadhaniFebryanti Sthevanie