Anomalous sound detection is central to audio-based surveillance and monitoring. In a domestic environment, however, the classes of sounds to be considered anomalous are situation-dependent and cannot be determined in advance. At the same time, it is not feasible to expect a demanding labeling effort from the end user. To address these problems, we present a novel zero-shot method relying on an auxiliary large-scale pretrained audio neural network in support of an unsupervised anomaly detector. The auxiliary module is tasked to generate a fingerprint for each sound occasionally registered by the user. These fingerprints are then compared with those extracted from the input audio stream, and the resulting similarity score is used to increase or reduce the sensitivity of the base detector. Experimental results on synthetic data show that the proposed method substantially improves upon the unsupervised base detector and is capable of outperforming existing few-shot learning systems developed for machine condition monitoring without involving additional training.
Qiuqiang KongYin CaoTurab IqbalYuxuan WangWenwu WangMark D. Plumbley
Qiuqiang KongYin CaoTurab IqbalYuxuan WangWenwu WangMark D. Plumbley
Xuenan XuPingyue ZhangMing YanJi ZhangMengyue Wu
Madhulika YarlagaddaSusrutha EttimallaBhanu Sri Davuluri