JOURNAL ARTICLE

Optimizing Query Processing Under Skew

Abstract

Big data systems such as relational databases, data science platforms, and scientific workflows all process queries over large and complex datasets. Skew is common in these real-world datasets and workloads. Different types of skew can have different impacts on the performance of query processing. Although skew sometimes causes load imbalance in a parallel execution environment, negatively impacting query performance, we demonstrate in this thesis that, in many cases we can actually improve the query performance in the presence of skew. To optimize query processing under skew, we develop a set of techniques to exploit the positive effects of skew and to avoid the negative effects. In order to exploit skew, we propose techniques including: (a) intentionally creating skew and clustering data in a distributed database system; (b) optimizing data layout for better caching in main-memory databases; and (c) adaptive execution techniques that are responsive to the underlying data in the context of compilers. In order to ameliorate skew, we study optimized hash-based partitioning that alleviate outliers in a genomic data context, as well as parallel prefix sum algorithms that used to develop skew-insensitive algorithms. We evaluate the effectiveness of our techniques over synthetic data, standard benchmarks, as well as empirical datasets, and show that the performance of query processing under skew can be greatly improved. Overall this thesis has made a concrete contribution to skew-related query processing.

Keywords:
Computer science Skew

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.16
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Advanced Database Systems and Queries
Physical Sciences →  Computer Science →  Computer Networks and Communications
Data Management and Algorithms
Physical Sciences →  Computer Science →  Signal Processing
Graph Theory and Algorithms
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Optimizing Distributed Query Processing.

Seyed H. Roosta

Journal:   Parallel and Distributed Processing Techniques and Applications Year: 2005 Vol: 51 (4)Pages: 869-875
BOOK-CHAPTER

On Optimizing Workflows Using Query Processing Techniques

Georgia KougkaAnastasios Gounaris

Lecture notes in computer science Year: 2012 Pages: 601-606
JOURNAL ARTICLE

Optimizing Skyline Query Processing in Incomplete Data

Yonis GulzarAli A. AlwanSherzod Turaev

Journal:   IEEE Access Year: 2019 Vol: 7 Pages: 178121-178138
© 2026 ScienceGate Book Chapters — All rights reserved.