摘要
鉴于现有的数据预取算法不能满足高效能异构计算系统对动态随机存取存储器(DRAM)和非易失性存储器(NVM)相结合的新型异构存储器高效访问的要求,提出了一种模拟退火的全局优化数据预取算法(SADPA)。该算法在启发式搜索模拟退火算法的基础上,引入了随机因子,以避免局部最优,从而确定了全局优化阈值以预取NVM页面的有效数量。实验结果表明,该算法相对于静态阈值调整算法,平均访问延时降低了4%,每个时钟周期内的平均指令数(IPC)增加了10.1%;对于cactusADM应用,该算法相对于软硬件协同的动态阈值调整算法,系统能耗降低了3.4%。
Due to the existing data prefetching algorithms can 't meet the requirements of the novel heterogeneous memory system combining the dynamic random access memory(DRAM) with the nonvolatile memory(NVM) in high energy-efficiency heterogeneous computing systems, a simulated annealing data prefetching algorithm(SADPA) was proposed. It was a heuristic search inspired simulated annealing algorithm, in which a random factor was introduced to confirm the global optimal threshold and the valid number of prefetching NVM pages. The results show that the average accessing latency of SADPA is 4% lower than that of the static threshold adjustment algorithm, and the average instruction per cycle(IPC) of the SADPA is 10.1% greater than that of the static threshold adjustment algorithm. Besides, the systemic power supported by SADPA, as for the cactusADM, is reduced by3.4% compared with the cooperative hardware/software dynamic threshold adjustment algorithm.
引文
[1]郭勇,尉红梅,漆锋滨.基于局部性分析数据预取在GCC上的实现[C]//中国计算机学会软件工程专委会2006年年会论文集.长沙:中国计算机学会,2006:21-23.
[2]连瑞琦,张兆庆,乔如良.指令级并行编译器的数据预取及优化方法[J].计算机学报,2000,23(6):576-584.
[3]ISLAM M,BANERJEE S,MESWANI M,et al.Prefetching as a potentially effective technique for hybrid memory optimization[C]//Proceedings of the 2nd International Symposium on Memory Systems.Alexandria:ACM,2016:220-231.
[4]PARK Y,SHIN D J,PARK S K,et al.Power-aware memory management for hybrid main memory[C]//Proceedings of the 2nd International Conference on Next Generation Information Technology.Gyeongju:IEEE,2011:82-85.
[5]裴颂文,吴小东,唐作其,等.异构千核处理器系统的统一内存地址空间访问方法[J].国防科技大学学报,2015,37(1):28-33.
[6]QURESHI M K,SRINIVASAN V,RIVERS J A.Scalable high performance main memory system using phasechange memory technology[C]//Proceedings of the 36th International Symposium on Computer Architecture.Austin:ACM,2009:24-33.
[7]LEE H G,BAEK S,NICOPOULOS C,et al.An energyand performance-aware DRAM cache architecture for hybrid DRAM/PCM main memory systems[C]//Proceedings of the 29th International Conference on Computer Design.Amherst:IEEE Computer Society,2011:381-387.
[8]罗乐,刘轶,钱德沛.内存计算技术研究综述[J].软件学报,2016,27(8):2147-2167.
[9]PEI S W,ZHANG J G,XIONG N X,et al.Performanceenergy efficiency model of heterogeneous parallel multicore system[C]//Proceedings of the 6th Green and Sustainable Computing Conference.Las Vegas:IEEE,2016:1-6.
[10]YIN H,SONG D.Temu:binary code analysis via wholesystem layered annotative execution[R].Berkeley:University of California,2010.
[11]RAMOS L E,GORBATOV E,BIANCHINI R.Page placement in hybrid memory systems[C]//Proceedings of the International Conference on Supercomputing.Tucson:ACM,2011:85-95.
[12]张进宝.一种基于页面热度的异构内存能耗管理机制[D].武汉:华中科技大学,2015.
[13]YOON H,MEZA J,AUSAVARUNGNIRUN R,et al.Row buffer locality aware caching policies for hybrid memories[C]//Proceedings of the 30th International Conference on Computer Design.Montreal:IEEE,2012:337-344.
[14]LIU H K,CHEN Y J,LIAO X F,et al.Hardware/software cooperative caching for hybrid DRAM/NVM memory architectures[C]//Proceedings of International Conference on Supercomputing.Chicago:ACM,2017:26.
[15]IOANNIDIS Y E,WONG E.Query optimization by simulated annealing[C]//Proceedings of the 1987 ACMSIGMOD International Conference on Management of Data.San Francisco,California:ACM,1987:9-22.
[16]LIU H K,CHEN Y J,LIAO X F,et al.Hardware/software cooperative caching for hybrid DRAM/NVM memory architectures[C]//Proceedings of the International Conference on Supercomputing.Chicago:ACM,2017:26.
[17]SANCHEZ D,KOZYRAKIS C.ZSim:fast and accurate microarchitectural simulation of thousand-core systems[J].ACM SIGARCH Computer Architecture News,2013,41(3):475-486.
[18]POREMBA M,XIE Y.NVMain:an architectural-level main memory simulator for emerging non-volatile memories[C]//Proceedings of 2012 IEEE Computer Society Annual Symposium on VLSI.Amherst:IEEE Computer Society,2012:392-397.
[19]HENNING J L.SPEC CPU2006 benchmark descriptions[J].ACM SIGARCH Computer Architecture News,2006,34(4):1-17.