With the increase of violent incidents in public and private places, it brings and urgent need for the development of advanced and efficient surveillance systems. Traditional methods, rely on manual monitoring, which are impractical and inefficient for violence detection in complex scenarios. This paper presents a lightweight deep learning framework for real-time violence detection, using knowledge distillation to improve public safety and security. We propose a teacher-student model approach, where a large, pre-trained VGG16 model serves as the teacher to transfer knowledge to a significantly smaller, custom-built CNN student model. The methodology involves training the teacher model on a Kaggle based dataset of violent and non-violent images, followed by training the student model using a combined distillation loss function that balances hard and soft targets. Our results demonstrate that the teacher model achieves a high accuracy of 90.96%. The student model, with a remarkable 7.20x reduction in parameters, achieves an accuracy of 83.85%, successfully retaining over 92% of the teacher’s performance. This framework offers a convincing trade-off between model size and accuracy, making it an effective, efficient and scalable solution for real-time deployment on mobile devices in smart city surveillance systems.
Zhefei WeiZhe HuangBaolong GuoCheng LiGeng Wang
William T. TarimoMoustafa M.SabraShonan Hendre
Yongbo ZhiNing XiYuanqing LiuHonglei Hui