Distributed arrays: an algebra for generic distributed query processing

Ralf Hartmut Güting; Thomas Behr; Jan Kristof Nidzwetzki

doi:10.1007/s10619-021-07325-2

ScienceGate Book Chapters

JOURNAL ARTICLE

Distributed arrays: an algebra for generic distributed query processing

Ralf Hartmut Güting Thomas Behr Jan Kristof Nidzwetzki

Year: 2021 Journal: Distributed and Parallel Databases Publisher: Springer Science+Business Media

DOI: 10.1007/s10619-021-07325-2

Get Full-Text PDF Get Analytical Report

Abstract

Abstract We propose a simple model for distributed query processing based on the concept of a distributed array . Such an array has fields of some data type whose values can be stored on different machines. It offers operations to manipulate all fields in parallel within the distributed algebra . The arrays considered are one-dimensional and just serve to model a partitioned and distributed data set. Distributed arrays rest on a given set of data types and operations called the basic algebra implemented by some piece of software called the basic engine . It provides a complete environment for query processing on a single machine. We assume this environment is extensible by types and operations. Operations on distributed arrays are implemented by one basic engine called the master which controls a set of basic engines called the workers . It maps operations on distributed arrays to the respective operations on their fields executed by workers. The distributed algebra is completely generic: any type or operation added in the extensible basic engine will be immediately available for distributed query processing. To demonstrate the use of the distributed algebra as a language for distributed query processing, we describe a fairly complex algorithm for distributed density-based similarity clustering. The algorithm is a novel contribution by itself. Its complete implementation is shown in terms of the distributed algebra and the basic algebra. As a basic engine the Secondo system is used, a rich environment for extensible query processing, providing useful tools such as main memory M-trees, graphs, or a DBScan implementation.

Keywords:

Computer science DBSCAN Query language Extensibility Theoretical computer science Set (abstract data type) Distributed database Query optimization Distributed design patterns Distributed computing Distributed algorithm Programming language Database Cluster analysis Artificial intelligence

Metrics

Cited By

0.17

FWCI (Field Weighted Citation Impact)

Refs

0.47

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Database Systems and Queries

Physical Sciences → Computer Science → Computer Networks and Communications

Data Management and Algorithms

Physical Sciences → Computer Science → Signal Processing

Cloud Computing and Resource Management

Physical Sciences → Computer Science → Information Systems

Distributed arrays: an algebra for generic distributed query processing

Abstract

Metrics

Citation History

Topics

Related Documents

Distributed Arrays – An Algebra for Generic Distributed Query Processing

Distributed Query Processing Using Suffix Arrays

Distributed Query Processing

Distributed Query Processing

Distributed Query Processing