To address the problems of current sitting recognition methods such as low recognition accuracy, susceptibility to environmental interference, and the inability to adapt to viewpoint changes, a deep fusion neural network model based on VGG, Resnet, and Vision Transformer is proposed. Firstly, the image data are enhanced by using random cropping and random flipping; secondly, shallow features are extracted by the VGG network, and constant mapping is added by the Resnet residual network to extract multi-dimensional features of the image; and finally, the fused feature information is input into the Vision Transformer network for sitting posture recognition. The experimental results show that the accuracy of the fusion model for sitting posture recognition reaches 98.3% and improves to 22.9%, 12.8%, 7.8 % , 17.5 % , 27.5 % , and 20% compared with the traditional VGG, Resnet, Vision Transformer, MobileNet, EfficientNet, and Inception networks, respectively. The fusion model can adapt to viewpoint changes in a range of up to 45 and combines the advantages of individual models; the localization of key feature regions is more complete and accurate, which improves the recognition accuracy.
Jianquan WangBasim HafidhHaiwei DongAbdulmotaleb El Saddik
Xiaoying ZhengDing ChenXin YuanJing Sheng
Yi-Chien ShihBor‐Shyh LinHsin-Lung WuChih‐Wei PengBor‐Shyh Lin
Chih-Wei LinZhongsheng ChenMengxiang Lin
Junrong BaoWenfeng LiLi JianYanhong GeChongzhi Bao