The descriptions of Chinese symptoms are rich and varied, and the components of Chinese symptoms are complex and changeable. As an important step to transform unstructured electronic medical records into structured ones, the recognition of Chinese symptom components is helpful to fully grasp the information which a symptom brings. What's more, it is also the foundation of symptom standardization as well as condition quantification. In this paper, we first propose a model of Chinese symptom composition, which classifies symptom components into eleven types, such as atomsymptoms, body parts, and headwords. Then we regard the component recognition task as a sequence labeling problem. We use Bidirectional LSTM-CRF along with part-of-speech features and data augmentation to solve the problem. Experiments show that our method achieves the best performance, with the Accuracy of 92.77% and 94.34% in symptom and component level, respectively. The results are 20.72% and 14.42% higher than the base model.
Jianyong DuanBing WangZheng TanXiaopeng WeiHao Wang
Xuemin YangZhihong GaoYongmin LiChuandi PanRonggen YangLejun GongGeng Yang
Li YangYing LiJin WangZhuo Tang
Dongyang ZhaoJiuming HuangYan Jia
Hongbin WangHaibing WeiJianyi GuoLiang Cheng