Fine-grained action recognition is important for many applications of human-robot interaction, automated skill assessment, and surveillance. The goal is to segment and classify all actions occurring in a time series sequence. While recent recognition methods have shown strong performance in robotics applications, they often require hand-crafted features, use large amounts of domain knowledge, or employ overly simplistic representations of how objects change throughout an action. In this paper we present the Latent Convolutional Skip Chain Conditional Random Field (LC-SC-CRF). This time series model learns a set of interpretable and composable action primitives from sensor data. We apply our model to cooking tasks using accelerometer data from the University of Dundee 50 Salads dataset and to robotic surgery training tasks using robot kinematic data from the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS). Our performance on 50 Salads and JIGSAWS are 18.0% and 5.3% higher than the state of the art, respectively. This model performs well without requiring hand-crafted features or intricate domain knowledge. The code and features have been made public.
Yujun MaRuili WangMing ZongWanting JiYi WangBaoliu Ye
Yujun MaRuili WangMing ZongWanting JiYi WangBaoliu Ye
Zhihao LiuYi ZhangWenhui HuangYan LiuMengyang PuChao DengJunlan Feng
Tingting HanHongxun YaoXiaoshuai SunWenlong XieSicheng ZhaoWei Yu
Hailun ZhangXinrui WangQijun Zhao