JOURNAL ARTICLE

MuSQLE: Distributed SQL query execution over multiple engine environments

Abstract

Multi-engine analytics has been gaining an increasing amount of attention from both the academic and the industrial community as it can successfully cope with the heterogeneity and complexity that the plethora of frameworks, technologies and requirements have brought forth. It is now common for a data analyst to combine data that resides on multiple and totally independent engines and perform complex analytics queries. Multi-engine solutions based on SQL can facilitate such efforts, as SQL is a popular standard that the majority of data-scientists understands. Existing solutions propose a middleware that centrally optimizes query execution for multiple engines. Yet, this approach requires manual integration of every primitive engine operator along with its cost model, rendering the process of adding new operators or engines highly inextensible. To address this issue we present MuSQLE, a system for SQL-based analytics over multi-engine environments. MuSQLE can efficiently utilize external SQL engines allowing for both intra and inter engine optimizations. Our framework adopts a novel API-based strategy. Instead of manual integration, MuSQLE specifies a generic API, used for the cost estimation and query execution, that needs to be implemented for each SQL engine endpoint. Our engine API is integrated with a state-of-the-art query optimizer, adding support for location-based, multi-engine query optimization and letting individual runtimes perform sub-query physical optimization. The derived multi-engine plans are executed using the Spark distributed execution framework. Our detailed experimental evaluation, integrating PostgreSQL, MemSQL and SparkSQL under MuSQLE, demonstrates its ability to accurately decide on the most suitable execution engine. MuSQLE can provide speedups of up to 1 order of magnitude for TPCH queries, leveraging different engines for the execution of individual query parts.

Keywords:
Computer science SQL Query by Example Query optimization Analytics Sargable SPARK (programming language) Database Query language Search engine Web search query Information retrieval Programming language

Metrics

24
Cited By
3.28
FWCI (Field Weighted Citation Impact)
20
Refs
0.93
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Database Systems and Queries
Physical Sciences →  Computer Science →  Computer Networks and Communications
Data Management and Algorithms
Physical Sciences →  Computer Science →  Signal Processing
Cloud Computing and Resource Management
Physical Sciences →  Computer Science →  Information Systems
© 2026 ScienceGate Book Chapters — All rights reserved.