JOURNAL ARTICLE

LncRNA-Disease Association Prediction Model Applying Distance-based Data Labeling

Jaein KimSeungwon YoonIn-Woo HwangKyu-Chul Lee

Year: 2023 Journal:   Journal of KIISE Vol: 50 (5)Pages: 420-428   Publisher: Korea Information Science Society

Abstract

lncRNA는 200개 이상의 뉴클레오타이드로 이루어져 있는 비암호화 RNA이다. 비암호화 RNA는 단백질을 직접 생성하지 못해 중요도가 낮은 물질로 여겨져 왔으나 비암호화 RNA가 단백질 발현을 조절하는 역할을 하는 것으로 밝혀지며 최근 많은 연구가 진행되고 있다. lncRNA의 비정상적인 발현은 다양한 질병의 원인이 되며 lncRNA와 질병의 연관성을 예측함으로써 초기 질병의 진단 또는 질병 예방에 도움을 줄 수 있다. 한편 생물학적 데이터의 연관성을 예측하는 연구는 직접적인 실험으로 진행할 경우 오랜 시간과 큰 비용이 들어가므로 이러한 문제점을 계산적인(computational) 방법을 적용하여 보완하는 것이 중요하다. 따라서 본 연구에서는 LSTM(Long Short-Term Memory)을 기반으로 한 lncRNA-질병 연관성 예측 모델을 제안한다. 또한, 기존 연구에서는 임의로 네거티브 샘플을 생성하여 데이터에 불확실성이 존재하므로 본 연구에서는 이런 불확실성을 해결하는 거리를 기반으로 한 데이터 레이블링 방법 역시 제안한다. 본 연구에서 제시한 데이터 레이블링 방법과 분류 모델을 통해 최고 AUC 0.97을 달성하였다. lncRNAs are noncoding RNAs of 200 or more nucleotides. For a long time, non-coding RNA has been considered unimportant because it cannot directly produce proteins, but recent studies have reported that non-coding RNA plays a role in regulating protein expression. Abnormal expression of lncRNAs causes various diseases and predicting the associations between lncRNAs and diseases would help diagnose diseases in the early stages or prevent diseases. However, research that predicts the correlation of biological data is time-consuming and costly if it is conducted as a direct experiment. Therefore, it is important to overcome these challenges using computational methods. Therefore, in this study, we propose a lncRNA-disease association prediction model based on Long Short-Term Memory (LSTM). In addition, since negative samples were randomly generated in previous studies, there is uncertainty in the data. So this study also proposes a distance-based data labeling method that solves this uncertainty. Our model achieved the highest AUC (0.97) through the data labeling method and classification model presented in this study.

Keywords:
RNA Computer science Long non-coding RNA Coding (social sciences) Computational biology Data mining Artificial intelligence Machine learning Biology Statistics Mathematics Genetics Gene

Metrics

1
Cited By
0.22
FWCI (Field Weighted Citation Impact)
0
Refs
0.69
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Cancer-related molecular mechanisms research
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Cancer Research
Nutrition, Health and Food Behavior
Health Sciences →  Nursing →  Nutrition and Dietetics
Food Quality and Safety Studies
Life Sciences →  Agricultural and Biological Sciences →  Food Science

Related Documents

JOURNAL ARTICLE

lncRNA-disease association prediction based on latent factor model and projection

Bo WangChao ZhangXiaoxin DuJianfei Zhang

Journal:   Scientific Reports Year: 2021 Vol: 11 (1)Pages: 19965-19965
JOURNAL ARTICLE

A Novel Method for LncRNA-Disease Association Prediction Based on an lncRNA-Disease Association Network

Pengyao PingLei WangLinai KuangSongtao YeMuhammad Faisal Buland IqbalTingrui Pei

Journal:   IEEE/ACM Transactions on Computational Biology and Bioinformatics Year: 2018 Vol: 16 (2)Pages: 688-693
JOURNAL ARTICLE

Cluster correlation based method for lncRNA-disease association prediction

Qianqian YuanXingli GuoYang RenWen XiaoLin Gao

Journal:   BMC Bioinformatics Year: 2020 Vol: 21 (1)Pages: 180-180
JOURNAL ARTICLE

Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction

LI Jin REN Shou-peng

Journal:   DOAJ (DOAJ: Directory of Open Access Journals) Year: 2022
© 2026 ScienceGate Book Chapters — All rights reserved.