Suffix Tree Construction based Mapreduce

Sihem Klai Soukehal; Karima Chibane; Med Tarek Khadir

doi:10.1109/ictaacs48474.2019.8988123

ScienceGate Book Chapters

JOURNAL ARTICLE

Suffix Tree Construction based Mapreduce

Sihem Klai Soukehal Karima Chibane Med Tarek Khadir

Year: 2019 Pages: 1-6

DOI: 10.1109/ictaacs48474.2019.8988123

Get Full-Text PDF Get Analytical Report

Abstract

The genome sequence indexing is a primary step in order to facilitate other further treatments such as patterns search or assembly with a reference genome etc. And the suffix tree is one of the most used data structures for indexing the genome sequence. However, the memory required for running the suffix tree construction algorithms may exceed the amount of available main memory. Despite the efforts made by the researchers, the construction of suffix tree remains very expensive with the use of data centres to ensure optimal parallelization of treatments and reduce the execution time without forgetting the risks of breakdown and the problems that it breeds. The parallelization performed by Hadoop and MapReduce gives solutions to storage and data processing capacity limitations as well as fault tolerance, all that at reasonable costs. The emergence of Hadoop, a framework related to big data and the paradigm MapReduce that allows to model parallel and distributed processing, is investigating many domains of science in order to effectively parallel their treatments. PWOTD (Partition and Write Only Top Down) algorithm, is chosen here as it has proven itself in textual algorithms for genome sequencing. In this paper, an approach to model the parallel construction of the suffix tree using the MapReduce paradigm is designed for implementation in Hadoop with a java API.

Keywords:

Computer science Suffix tree Search engine indexing Suffix Compressed suffix array Generalized suffix tree Suffix array Parallel computing Tree (set theory) Big data Partition (number theory) Trie Auxiliary memory Data structure Theoretical computer science Data mining Programming language Artificial intelligence Operating system

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.20

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Algorithms and Data Compression

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Data Storage Technologies

Physical Sciences → Computer Science → Computer Networks and Communications

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Suffix Tree Construction based Mapreduce

Abstract

Metrics

Topics

Related Documents

MapReduce based parallel suffix tree construction for human genome

Suffix Tree Construction

Suffix Tree Construction

Practical Suffix Tree Construction

Practical Suffix Tree Construction