JOURNAL ARTICLE

Effective cache prefetching on bus-based multiprocessors

Dean M. TullsenSusan J. Eggers

Year: 1995 Journal:   ACM Transactions on Computer Systems Vol: 13 (1)Pages: 57-88   Publisher: Association for Computing Machinery

Abstract

Compiler-directed cache prefetching has the potential to hide much of the high memory latency seen by current and future high-performance processors. However, prefetching is not without costs, particularly on a shared-memory multiprocessor. Prefetching can negatively affect bus utilization, overall cache miss rates, memory latencies and data sharing. We simulate the effects of a compiler-directed prefetching algorithm, running on a range of bus-based multiprocessors. We show that, despite a high memory latency, this architecture does not necessarily support prefetching well, in some cases actually causing performance degradations. We pinpoint several problems with prefetching on a shared-memory architecture (additional conflict misses, no reduction in the data-sharing traffic and associated latencies, a multiprocessor's greater sensitivity to memory utilization and the sensitivity of the cache hit rate to prefetch distance) and measure their effect on performance. We then solve those problems through architectural techniques and heuristics for prefetching that could be easily incorporated into a compiler: (1) victim caching, which eliminates most of the cache conflict misses caused by prefetching in a direct-mapped cache, (2) special prefetch algorithms for shared data, which significantly improve the ability of our basic prefetching algorithm to prefetch individual misses, and (3) compiler-based shared-data restructuring, which eliminates many of the invalidation misses the basic prefetching algorithm does not predict. The combined effect of these improvements is to make prefetching effective over a much wider range of memory architectures.

Keywords:
Instruction prefetch Computer science Parallel computing Cache CAS latency Cache pollution Cache-only memory architecture Latency (audio) Shared memory CPU cache Cache coloring Operating system Cache algorithms Memory controller Semiconductor memory

Metrics

41
Cited By
1.40
FWCI (Field Weighted Citation Impact)
33
Refs
0.83
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Parallel Computing and Optimization Techniques
Physical Sciences →  Computer Science →  Hardware and Architecture
Distributed systems and fault tolerance
Physical Sciences →  Computer Science →  Computer Networks and Communications
Advanced Data Storage Technologies
Physical Sciences →  Computer Science →  Computer Networks and Communications
© 2026 ScienceGate Book Chapters — All rights reserved.