Human hands are regarded as important tools for communicating with others owing to their variety and flexibility. As the development of convolutional neural networks (CNNs) in image recognition improves, many researchers have begun to apply CNNs to the research of hand pose estimation, and they have made remarkable breakthroughs. Despite this, with the volume of mobile phones and other portable devices decreasing, if we want to apply the estimation model to such devices, we need to be able to use the least amount of model parameters and calculation cost to achieve the purpose of hand pose estimation.In this paper, we aim to achieve the purpose of hand pose estimation by using fewer parameters than the state-of-the-art model. To use the least number of possible parameters, we use a 2D CNN as the backbone. In our model, a variety of operations to reduce the number of parameters are used, finally resulting in the number of parameters being reduced to 1.5M. This is 21% less than the state-of-the-art model [1]. In addition, our model can achieve 48 frames per second easily in real time. Finally, we test our model on the common hand pose estimation dataset RHD. From the experimental results, considering the accuracy and model parameters, we have found that our method can reach 0.9713 on [email protected], and outperform the current model with parameters less than 10M.
Xingyu LiuPengfei RenYuanyuan GaoJingyu WangHaifeng SunQi QiZirui ZhuangJianxin Liao
Danilo AvolaLuigi CinqueAlessio FagioliGian Luca ForestiAdriano FragomeniDaniele Pannone