JOURNAL ARTICLE

Effectively sharing a cache among threads

Abstract

We compare the number of cache misses M1 for running a computation on a single processor with cache size C1 to the total number of misses Mp for the same computation when using p processors or threads and a shared cache of size Cp. We show that for any computation, and with an appropriate (greedy) parallel schedule, if Cp ≥ C1 + pd then Mp ≤ M1. The depth d of the computation is the length of the critical path of dependences. This gives the perhaps surprising result that for sufficiently parallel computations the shared cache need only be an additive size larger than the single-processor cache, and gives some theoretical justification for designing machines with shared caches.We model a computation as a DAG and the sequential execution as a depth first schedule of the DAG. The parallel schedule we study is a parallel depth-first schedule (PDF schedule) based on the sequential one. The schedule is greedy and therefore work-efficient. Our main results assume the Ideal Cache model, but we also present results for other more realistic cache models.

Keywords:
Computer science Cache Parallel computing Operating system

Metrics

83
Cited By
2.70
FWCI (Field Weighted Citation Impact)
30
Refs
0.90
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Parallel Computing and Optimization Techniques
Physical Sciences →  Computer Science →  Hardware and Architecture
Optimization and Search Problems
Physical Sciences →  Computer Science →  Computer Networks and Communications
Complexity and Algorithms in Graphs
Physical Sciences →  Computer Science →  Computational Theory and Mathematics
© 2026 ScienceGate Book Chapters — All rights reserved.