Biomedical Entity extraction is the process of identifying biomedical instances such as disorders, viruses, proteins, genes and others. One of these instances is the chemical compound which caught many researchers' attentions regarding the challenging task of extracting them. In fact, most of the studies that have been proposed for chemical compounds extraction have relied on supervised machine learning techniques regarding its ability to adopt a statistical model rather than handcrafted rules. However, the key characteristic of the use of supervised machine learning techniques lies on the utilized features. There is a wide range of features that have been used in the previous studies for the process of extracting chemical compounds. Hence, the need of accommodating a feature selection task in order to determine the best combination of features is becoming imperative. Therefore, this paper aims to apply a combination of Naïve Bayes classification method with the Wrapper Subset Selection approach to identify the best features. Results showed that the proposed combination has the ability to identify the best combination of features which consists of Capitalization, Punctuation, Prefix and Part-Of-Speech Tagging by achieving 0.72 of f-measure. Such result has been compared to the state of the art and it demonstrated competitive performance.
Pablo BermejoJosé A. GámezJosé M. Puerta
Juliansyah Putra TanjungFenny Chintya TampubolonAri Wahyuda PanggabeanM. Anjas Asmara Nandrawan
Hermawan SyahputraJosua NainggolanJohanes Apriadi Parlinggoman SiraitMuhammad Fadlan IkromiPutri Ameliya Lubis