A virtual animatable hand avatar capable of representing a user's hand shape and appearance, and tracking the articulated motion is essential for an immersive experience in AR/VR. Recent approaches use implicit representations to capture geometry and appearance combined with neural rendering. However, they fail to generalize to unseen shapes, don't handle lighting leading to baked-in illumination and self-shadows, and cannot capture complex poses. In this thesis, we 1) introduce a novel hand shape model that augments a data-driven shape model and adapt its local scale to represent unseen hand shapes, 2) propose a method to reconstruct a detailed hand avatar from monocular RGB video captured under real-world environment lighting by jointly optimizing shape, appearance, and lighting parameters using a realistic shading model in a differentiable rendering framework incorporating Monte Carlo path tracing, and 3) present a robust hand tracking framework that accurately registers our hand model to monocular depth data utilizing a modified skinning function with blend shapes. Our evaluation demonstrates that our approach outperforms existing hand shape and appearance reconstruction methods on all commonly used metrics. Further, our tracking framework improves over existing generative and discriminative hand pose estimation methods.
Pratik KalshettiParag Chaudhuri
Di HuangX. JiXingyi HeJiaming SunTong HeQing ShuaiWanli OuyangXiaowei Zhou
Kishore VenkateshanArvind ShekarSnehanshu Saha
Pratik KalshettiParag Chaudhuri
Christian TheobaltEdilson de AguiarMarcus MagnorHans‐Peter Seidel