Identification of splice site plays a key role for alternative splicing analysis. Many effective methods have been proposed over the past decades. However, there have still some limitations and need further improvement. In this paper, we collect splice site sequences from Homo Sapiens Splice Sites Dataset (HS3D), to transform these sequences, we use two kinds of methods to code them and then use support vector machines (SVM) as predictor. In order to reduce computational time, maximum relevance minimum redundancy (mRMR) is adopted to rank the features for finding optimal feature combination. On the donor splice site sequence data, our method achieves 92.85% accuracy, area under ROC (Receiver Operating Characteristic) curve (ROC_AUC) of 97.62%. On the acceptor splice site data, our method achieves 92.29% accuracy and 97.37% ROC_AUC. These results show that our method is effective and reliable for splice sites prediction.
Sören SonnenburgGabriele SchweikertPetra PhilipsJonas BehrGunnar Rätsch
Yong ZhangChao‐Hsien ChuYi‐Ping Phoebe ChenHongyuan ZhaXiangling Ji
Sven DegroeveBernard De BaetsYves Van de PeerPierre Rouzé
Nurul Hidayah ParmanRohayanti HassanNoor Hidayah Zakaria