Mukesh KumarRansingh Biswajit RaySantanu Kumar Rath
Microarray data has a major drawback of a curse of dimensionality, where the number of features are huge in comparison with that of samples. The data retrieved from microarray cover the varieties in its nature, and changes observed with time. The vast amount of raw gene expression data often leads to computational and analytical challenges, including classification of the dataset into correct groups or classes. In this paper, various feature selection techniques based on statistical tests are proposed using Spark framework. After selecting the relevant features using various statistical tests, Artificial Neural Network (ANN) based on Spark framework (sf-ANN) is proposed, which runs on a scalable cluster with multiple nodes. The performance of sf-ANN is tested with the help of microarray datasets of various dimensions. A detailed comparative analysis in terms of execution time is presented on sf-ANN classifier based on Spark framework and conventional system (data is stored on a standalone machine) respectively, in order to examine its performance.
Santanu Kumar RathRansingh Biswajit RayMukesh Kumar
M. A. H. AkhandMd. AsaduzzamanMir HussainM. M.
Rabia Musheer AzizChandan Kumar VermaManoj K. JhaNamita Srivastava