Linna ZhangLanyao ZhangQi CaoShichao KanYigang CenFugui ZhangYingyu Huang
The goal of the feature reconstruction network based on an autoencoder in the training phase is to force the network to reconstruct the input features well. The network tends to learn shortcuts of “identity mapping,” which leads to the network outputting abnormal features as they are in the inference phase. As such, the abnormal features based on reconstruction error cannot be distinguished from normal features, significantly limiting the detection performance of such methods. To address this issue, we propose a feature transformation reconstruction (FTR) network, which can avoid the identity mapping problem. Specifically, we use a normalizing flow model as a feature transformation (FT) network to transform input features into other forms. The training goal of the feature reconstruction (FR) network is no longer to reconstruct the input features but to reconstruct the transformed features, effectively avoiding the shortcut of learning the “identity map.” Furthermore, this paper proposes a masked convolutional attention (MCA) module, which randomly masks the input features in the training phase and reconstructs the input features in a self‐supervised manner. In the testing phase, the MCA can effectively suppress the excessive reconstruction of abnormal features and further improve anomaly detection performance. FTR achieves the scores of the area under the receiver operating characteristic curve (AUROC) at 99.5% and 97.8% on the MVTec AD and BTAD datasets, respectively, outperforming other state‐of‐the‐art methods. Moreover, FTR is faster than the existing methods, with a high speed of 137 frames per second (FPS) on a 3080ti GPU.
Binjie ZhaoJiahao NieSiwei GuanHan WangZhiwei HeMingyu Gao
Jinghuang LinYifan HeWeixia XuJihong GuanJi ZhangShuigeng Zhou
Chenchen TaoChong WangSunqi LinSuhang CaiDi LiJiangbo Qian
Arun NagarajaUma BoregowdaKhalaf KhatatnehVangipuram RadhakrishnaRajasekhar NuvvusettyV. Sravan Kiran