In multicore and multicached architectures, cache coherence is ensured with a coherence protocol. However, the performance benefits of caching diminishes due to the cost associated with the protocol implementation. In this paper, we propose a novel technique to improve the performance of multithreaded programs running on shared-memory multicore processors by embracing approximate computing. Our idea is to relax the coherence requirement selectively in order to reduce the cost associated with a cache-coherence protocol, and at the same time, ensure a bounded QoS degradation with probabilistic reliability. In particular, we detect instructions in a multithreaded program that write to shared data, we call them Shared-Write-Access-Points (SWAPs), and propose an automated statistical analysis to identify those which can tolerate coherence faults. We call such SWAPs approximable. Our experiments on 9 applications from the SPLASH 3.0 benchmarks suite reveal that an average of 57% of the tested SWAPs are approximable. To leverage this observation, we propose an adapted cache-coherence protocol that relaxes the coherence requirement on stores from approximable SWAPs. Additionally, our protocol uses stale values for load misses due to coherence, the stale value being the version at the time of invalidation. We observe an average of 15% reduction in CPU cycles and 11% reduction in energy footprint from architectural simulation of the 9 applications using our approximate execution scheme.
Muhammad Abdullah HanifVojtěch MrázekMuhammad Shafique
Muhammad Abdullah HanifVojtěch MrázekMuhammad Shafique
Yunn-Yen ChenJih-Kwon PeirChung‐Ta King
Rafael H. Saavedra-BarreraDavid CullerThorsten von Eicken
Sourav BhattacharyaHoracio González–Vélez