Abstract

The genome sequence indexing is a primary step in order to facilitate other further treatments such as patterns search or assembly with a reference genome etc. And the suffix tree is one of the most used data structures for indexing the genome sequence. However, the memory required for running the suffix tree construction algorithms may exceed the amount of available main memory. Despite the efforts made by the researchers, the construction of suffix tree remains very expensive with the use of data centres to ensure optimal parallelization of treatments and reduce the execution time without forgetting the risks of breakdown and the problems that it breeds. The parallelization performed by Hadoop and MapReduce gives solutions to storage and data processing capacity limitations as well as fault tolerance, all that at reasonable costs. The emergence of Hadoop, a framework related to big data and the paradigm MapReduce that allows to model parallel and distributed processing, is investigating many domains of science in order to effectively parallel their treatments. PWOTD (Partition and Write Only Top Down) algorithm, is chosen here as it has proven itself in textual algorithms for genome sequencing. In this paper, an approach to model the parallel construction of the suffix tree using the MapReduce paradigm is designed for implementation in Hadoop with a java API.

Keywords:
Computer science Suffix tree Search engine indexing Suffix Compressed suffix array Generalized suffix tree Suffix array Parallel computing Tree (set theory) Big data Partition (number theory) Trie Auxiliary memory Data structure Theoretical computer science Data mining Programming language Artificial intelligence Operating system

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
21
Refs
0.20
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Algorithms and Data Compression
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Data Storage Technologies
Physical Sciences →  Computer Science →  Computer Networks and Communications
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

BOOK-CHAPTER

Suffix Tree Construction

Jens Stoye

Encyclopedia of Algorithms Year: 2016 Pages: 2144-2149
BOOK-CHAPTER

Suffix Tree Construction

Jens Stoye

Encyclopedia of Algorithms Year: 2014 Pages: 1-6
BOOK-CHAPTER

Practical Suffix Tree Construction

Sandeep TataRichard A. HankinsJignesh M. Patel

Elsevier eBooks Year: 2004 Pages: 36-47
BOOK-CHAPTER

Practical Suffix Tree Construction

S TATAR HANKINSJ PATEL

Elsevier eBooks Year: 2004 Pages: 36-47
© 2026 ScienceGate Book Chapters — All rights reserved.