Because crowd density varies greatly in real scenes, detection-based methods are less reliable in crowded areas. Existing methods of applying detection-based transformer models to complete crowd localization are also subject to the same constraints. Moreover, there are many small targets in the scene of dense crowds, which is even more obvious. To address this issue, our model employs context-aware module to extract information that fuses different scales, thereby addressing the potential rapid scale change, and uses transformer to build an end-to-end crowd localization model. Extensive experiments show that our model adaptively learns contextual information for crowd localization, significantly outperforming previous more advanced models.
Weizhe LiuMathieu SalzmannPascal Fua
Sarah JadMarwan TorkiAyman Khalafallah
Lixian YuanYandong ChenHefeng WuWentao WanPei Chen