B. LiH.-J. ZhangXingang WangRui CaoJinyan Zhou
In the paper, we propose a memory-guided dual-stream spatio-temporal encoder network (MSTAE) based on the U-Net network as the backbone, the spatial stream uses the time displacement module to obtain the spatial features of the video, and the temporal stream is aggregated across frames to obtain the temporal features of the video, meanwhile, the coordinate attention module is introduced to improve the U-Net network and enhance the dynamic entity representation capability. In order to reduce the prediction error, the memory module is used to record the prototype patterns of normal data to reduce the problem of small error between the prediction anomaly and its true value due to the excessive generalisation ability of the deep network. We conducted extensive experiments on three publicly available standard datasets (Ped2, Avenue and ShanghaiTech datasets). The experiments demonstrate that the research model outperforms state-of-the-art methods.
Yunlong WangMingyi ChenJiaxin LiHongjun Li
Guodong ShenYuqi OuyangVíctor Sánchez
Weijia LiuJiuxin CaoYilin ZhuBo LiuXuelin Zhu
Edgar Santos–FernándezJay M. Ver HoefErin E. PetersonJames McGreeCesar A. VillaCatherine LeighRyan TurnerCameron RobertsKerrie Mengersen