Wuque CaiHongze SunRui LiuYan CuiJun WangYang XiaDezhong YaoDaqing Guo
Spiking neural networks (SNNs) mimic brain computational strategies, and exhibit substantial capabilities in spatiotemporal information processing. As an essential factor for human perception, visual attention refers to the dynamic process for selecting salient regions in biological vision systems. Although visual attention mechanisms have achieved great success in computer vision applications, they are rarely introduced into SNNs. Inspired by experimental observations on predictive attentional remapping, we propose a new spatial-channel-temporal-fused attention (SCTFA) module that can guide SNNs to efficiently capture underlying target regions by utilizing accumulated historical spatial-channel information in the present study. Through a systematic evaluation on three event stream datasets (DVS Gesture, SL-Animals-DVS, and MNIST-DVS), we demonstrate that the SNN with the SCTFA module (SCTFA-SNN) not only significantly outperforms the baseline SNN (BL-SNN) and two other SNN models with degenerated attention modules, but also achieves competitive accuracy with the existing state-of-the-art (SOTA) methods. Additionally, our detailed analysis shows that the proposed SCTFA-SNN model has strong robustness to noise and outstanding stability when faced with incomplete data, while maintaining acceptable complexity and efficiency. Overall, these findings indicate that incorporating appropriate cognitive mechanisms of the brain may provide a promising approach to elevate the capabilities of SNNs.
Yuchen WangKexin ShiChengzhuo LuYuguo LiuMalu ZhangHong Qu
Ruijie ZhuMalu ZhangQihang ZhaoHaoyu DengYule DuanLiang-Jian Deng
Tianqing ZhangKe YuXian ZhongHongwei WangQi XuQiang Zhang
Xiyan WuYong SongYa ZhouYurong JiangYashuo BaiXinyi LiXin Yang