It is challenging to design a high performance memory sub-system for heterogeneous multicore processor FT64-3, which features 18 on chip 64-bit float function units. In this paper, we propose a parallel stream memory architecture that can greatly leverage the design idea of exploiting memory level parallelism for higher memory throughput., Experimental results and analysis for kernel algorithms are presented in the paper to show the efficiency and rationale of our design. By employing our parallel stream memory architercture, the performance of FT64-3 with a is 2–3 orders better than FT64-2 when running at the same clock frequency of 500 MHz, and is comparable to Itanium2 running at 1.6GHz but with less hardware cost.
Rangyu DengWeixia XuQiang DouHongwei ZhouZefu DaiHaiyan Chen
Yasutaka WadaAkihiro HayashiTakeshi MasuuraJun ShirakoHirofumi NakanoHiroaki ShikanoKeiji KimuraHironori Kasahara
Shih-Hao OuChe-Wei YehTai-Jyi LinChih‐Wei Liu