The indispensable knowledge of Deoxyribonucleic Acid (DNA) sequences and sharply reducing cost of the DNA sequencing techniques has attracted numerous researchers in the field of Genetics. These sequences are getting available at an exponential rate leading to the bulging size of molecular biology databases making large disk arrays and compute clusters inevitable for analysis.In this paper, we proposed referential DNA data compression using hadoop MapReduce Framework to process humongous amount of genetic data in distributed environment on high performance compute clusters. Our method has successfully achieved a better balance between compression ratio and the amount of time required for DNA data compression as compared to other Referential DNA Data Compression methods.
Thogaricheti AshwiniSahana LME MahalakshmiShweta S Padti