JOURNAL ARTICLE

Healthcare Data Analytics Challenge

Abstract

Online patient/caregiver support forums such as, cancer compass, ehealthforums, and patientslikeme, allow patients and caregivers to post health-related questions. In many of these forums, there is a significant volume of repetitive questions. One possible reason for this repetition could be that as forums grow longer, patients and caregivers do not have the time or patience to read through previous questions before posting their own question. The challenge here is to design and implement a system that, for a new question q, identifies a maximum of three existing questions that are most similar to q. In this challenge, we experimented with a variety of methods and representations to address this task, including approaches that leveraged topic modeling, distributional semantics (word2vec), and term frequency-inverse document frequencies (TF-IDF) to induce the vector representation of questions. For similarity measures, we used cosine similarity and the rescaled dot product over these feature spaces. Despite our experimentation with more recent methods, we found that simple TF-IDF with stemming using cosine similarity seemed to result in the best performance.

Keywords:
Word2vec Computer science Cosine similarity tf–idf Similarity (geometry) Distributional semantics Variety (cybernetics) Semantics (computer science) Information retrieval Representation (politics) Health care Artificial intelligence Data science Term (time) Semantic similarity Pattern recognition (psychology)

Metrics

4
Cited By
0.31
FWCI (Field Weighted Citation Impact)
9
Refs
0.81
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Healthcare data analytics challenge

Year: 2016 Pages: xli-xli
JOURNAL ARTICLE

IEEE ICHI Healthcare Data Analytics Challenge

Bisakha Ray

Year: 2015 Pages: 523-524
JOURNAL ARTICLE

Healthcare Data Analytics

Ivana Ognjanović

Journal:   Studies in health technology and informatics Year: 2020 Vol: 274 Pages: 122-135
JOURNAL ARTICLE

Healthcare data analytics

Hui YangO. Erhun KundakciogluDaniel Zeng

Journal:   Information Systems and e-Business Management Year: 2015 Vol: 13 (4)Pages: 595-597
© 2026 ScienceGate Book Chapters — All rights reserved.