Fall accidents are increasing, and monitoring them using real-time CCTV systems remains challenging. This paper compares the performance of YOLOv11 and RT-DETRv2 models for real-time fall detection. Experimental results show that YOLOv11 outperforms RT-DETRv2 in terms of inference speed, making it more suitable for real-time applications. Unlike earlier studies, we propose feature map-based knowledge distillation during the model training process to improve model performance. The proposed YOLO-based fall detection system transfers intermediate representations from a teacher to a student network and optimises two complementary objectives: spatial alignment via Mean-Squared-Error (MSE) loss and channel-wise distribution alignment via Kullback–Leibler (KL) divergence. Experiments improved the mean Average Precision (mAP) and reduced processing time by 0.8ms. Evaluation on AI-hub abnormal behavior datasets confirmed a 0.02 increase in accuracy and F1-score, demonstrating the effectiveness of the proposed distillation method in real-time environments.
Guangliang ZhuChunxia YuanFei Jiang
Fei AnMingEn ZhongYihong ZhangBingan YuanJiawei TanKang Fan
Ze YangXianliang JiangGuang JinJunkai HuangJie BaiDingxin Yu