Online health forums provide a convenient way for patients to obtain medical information and connect with physicians and peers outside of clinical settings. However, large quantities of unstructured and diversified content generated on these forums make it difficult for users to digest and extract useful information. Understanding user intents would enable forums to more accurately and efficiently find relevant information by filtering out threads that do not match particular intents. In this paper, we derive a taxonomy of intents to capture user information needs in online health forums, and propose novel pattern based features for use with a multiclass support vector machine (SVM) classifier to classify original thread posts according to their underlying intents. Since no dataset existed for this task, we employ three annotators to manually label a dataset of 1,200 Health-Boards posts spanning four forum topics. Experimental results show that SVM with pattern based features is highly capable of identifying user intents in forum posts, reaching a maximum precision of 75%. Furthermore, comparable classification performance can be achieved by training and testing on posts from different forum topics (e.g. training on allergy posts, testing on depression posts). Finally, we run a trained classifier on a MedHelp dataset to analyze the distribution of intents of posts from different forum topics.
Thomas ZhangJason H. D. ChoChengXiang Zhai
Guirong ChenNing WangFengqin ZhangHua Jiang
Paul MarshallNeil CatonZoe GlossopSteven JonesRachel MeacockPaul RaysonHeather RobinsonFiona Lobban
Xiaolin ShiJun ZhuRui CaiLei Zhang