Crowd counting means calculating the crowd density of the current scene. In modern society, the study on crowd counting is of great significance. This paper focuses on multi-view crowd counting, using multi-view images to predict crowd density and crowd distribution of the current scene. This paper proposes a crowd counting model with convolutional neural networks and transformer, which uses the same view branch to extract features with images from various views. The feature maps will be projected and fused in the same world space, then the scene-level feature maps are regressed to a scene-level density map. This paper does experiments on the PETS2009 and CityStreet datasets. Numerous experiments have proven that our method has good accuracy, especially for dense small objects.
N. Ali AkbarEsmeralda C. Djamal
Huaping GuoRui WangJing WangSun YangeJian LiMeng Li
Akshita PatwalManoj DiwakarVikas TripathiPrabhishek Singh
Lingke ZengXiangmin XuBolun CaiSuo QiuTong Zhang