Crowd counting on still images is very challenging due to heavy occlusions and scale variations. In this paper, we aim to develop a method that can accurately estimate the crowd count from a still image. Recently, convolutional neural networks have been shown effective in many computer vision tasks including crowd counting. To this end, we propose a fully convolutional network (FCN) architecture to map the input image of arbitrary size or resolution to its density map. In order to address the perspective and scale variation issues, Inception-like modules with multiple kernel size filters are used to capture multi-scale features, which is necessary for higher crowd counting performance. We test our model on challenging ShanghaiTech dataset, the results show that our method outperforms the state-of-the-art methods.
Ming LiuJue JiangZhenqei GuoZenan WangYang Liu
Yaocong HuHuan ChangFudong NianYan WangTeng Li
Jianyong WangLu WangFenglei Yang
Jinmeng CaoBiao YangYuyu ZhangLing Zou