Crowd counting estimation is an extremely challenging task due to various crowded scenarios. In this paper, we present a deep learning framework for crowd counting from a single static image with different number of people and arbitrary perspective. In the design of convolutional neural network structure, we employ the VGG16 model but drop the fully connected layers. Meanwhile, high-level features are combined with low-level features through laterally connected feature pyramid network by element-wise addition to ensure higher resolution and more context information. Extensive experiments are conducted on ShanghaiTech and UCF_CC_50 datasets. The results show that our model achieves the lowest mean absolute error (MAE) and comparable mean square error (MSE), and outperforms the current state-of-the-art methods.
Hao MaBaoqun YinLuyang WangHao Shi
Jianyong WangLu WangFenglei Yang