An anomalous sound detection (ASD) system detects substantial deviations from the norm and reports the degree of abnormality through an anomaly score. An important application scenario is the detection of malfunctions in factory machinery. Recent approaches train autoencoders on small segments of the sound's time-frequency representation and use the reconstruction error as a measure of abnormality. However, it was recently shown that this approach leads to consistently higher reconstruction errors for the edge frames of the segments. To alleviate this problem, the Interpolation Deep Neural Network (IDNN) predicts the center frame from the remaining context frames. In this work, we propose DRINK - Deep Recurrent INterpolation NetworKs, an extension of the aforementioned IDNN that enables a variable amount of center and context frames. Moreover, we use a Long-Short Term Memory network to explicitly account for the sequential nature of sound as opposed to simple feed-forward neural networks in the original work. We show that under the right setting of context and center frames, our method is able to outperform the IDNN and autoencoder baselines on a dataset of recordings from factory machinery in 13 out of 16 cases.
Kaori SuefusaTomoya NishidaHarsh PurohitRyo TanabeTakashi EndoYohei Kawaguchi
Elmar MessnerMatthias ZöhrerFranz Pernkopf
Minh‐Hieu NguyenDuy-Quang NguyenDinh-Quoc NguyenCong-Nguyen PhamDai BuiHuy-Dung Han
Phurich SaengthongTakahiro Shinozaki
Doha AbdelhadyYoussef AbdelrahmanH.A. Othman