For crowded scenes in dense occlusion conditions, it was difficult to count the crowd accurately. A Transformer-based fusion attention mechanism crowd counting network was proposed. First, in order to adapt more efficiently to small-scale changes, the network was based on the VGG19 network architecture, incorporate the attention mechanism ECANet, so as to better integrate channel interaction features. Then, the output feature mapping was transferred to the Transformer. Considering that the Transformer fails to feel localized information well and stable fusion, and added local attention module and streaming attention module. Finally, a regression attention mechanism header was designed to obtain finer density maps and predicted numbers of people. The effectiveness of the proposed method has been confirmed by extensive experiments on three challenging crowd counting datasets, namely UCF-QNRF, JHU++Crowd, and NWPU.
Tao WangTing ZhangKaibing ZhangHuake WangMinqi LiJian Lü
LUAN Fangjun, GONG Qi, YUAN Shuai
Qing HeQianqian YangYinfeng XiaSifan PengBaoqun Yin
Jie ZouYingying LiZijian HuYong Wang
Xiong LiHuizi DengYi HuPeng HuangQiyun Zhou