JOURNAL ARTICLE

A Comparative Study of Swin Transformer and CNN Models for Crowd Counting

Jiayi Guo

Year: 2024 Journal:   Applied and Computational Engineering Vol: 94 (1)Pages: 100-105

Abstract

Abstract. Crowd counting, a critical component in the management and safety planning of large gatherings and public spaces, is essential for ensuring smooth event operations and preventing potential overcrowding issues. While the standard convolutional neural network (CNN) based model performs well in head counting tasks, it has certain drawbacks when applied to complex scenarios. With the rapid development of artificial intelligence, Transformer models that rely on self-attention mechanisms, as Swin Transformer, have demonstrated exceptional performance in visual tasks, such as image classification, and segmentation in recent times. This study examines the experimental findings of Swin Transformer's head counting tasks and contrasts them with the CNN-based model. Mean Absolute Error (MAE) and Mean Square Error (MSE) evaluation indicators show that the Transformer model outperforms the classic CNN model in terms of generalization abilities when dealing with complicated scenarios. Future research work will increase the diversity of data sets and focus on optimizing model structure and improving training efficiency.

Keywords:
Computer science Transformer Convolutional neural network Artificial intelligence Segmentation Machine learning Data mining Engineering Voltage

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.20
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Air Quality Monitoring and Forecasting
Physical Sciences →  Environmental Science →  Environmental Engineering
Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Anomaly Detection Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.