A Holistic Energy-Efficient Approach for a Processor-Memory System

英文篇名：A Holistic Energy-Efficient Approach for a Processor-Memory System
作者：Feihao ; Wu ; Juan ; Chen ; Yong ; Dong ; Wenxu ; Zheng ; Xiaodong ; Pan ; Yuan ; Yuan ; Zhixin ; Ou ; Yuyang ; Sun
英文作者：Feihao Wu;Juan Chen;Yong Dong;Wenxu Zheng;Xiaodong Pan;Yuan Yuan;Zhixin Ou;Yuyang Sun;the College of Computer, National University of Defense Technology;
英文关键词：processor overclocking;;memory overclocking;;performance boost;;total power control;;energy efficiency
中文刊名：QHDY
英文刊名：清华大学学报自然科学版(英文版)
机构：the College of Computer, National University of Defense Technology;
出版日期：2019-04-10
出版单位：Tsinghua Science and Technology
年：2019
期：v.24
基金：the funding from the National Key Research and Development Program of China(No.2018YFB1003203);; the Advanced Research Project of China(No.31511010203);; Open Fund from State Key Laboratory of High Performance Computing(No.201503-02);; Research Program of NUDT(No.ZK18-03-10)
语种：英文;
页：QHDY201904010
页数：16
CN：04
ISSN：11-3745/N
分类号：100-115

摘要

Component overclocking is an effective approach to speed up the components of a system to realize a higher program performance; it includes processor overclocking or memory overclocking. However, overclocking will unavoidably result in increase in power consumption. Our goal is to optimally improve the performance of scientific computing applications without increasing the total power consumption for a processor-memory system. We built a processor-memory energy efficiency model for multicore-based systems, which coordinates the performance and power of processor and memory. Our model exploits performance boost opportunities for a processor-memory system by adopting processor overclocking, processor Dynamic Voltage and Frequency Scaling(DVFS), memory active ratio adjustment, and memory overclocking, according to different scientific applications.This model also provides a total power control method by considering the same four factors mentioned above. We propose a processor and memory Coordination-based holistic Energy-Efficient(CEE) algorithm, which achieves performance improvement without increasing the total power consumption. The experimental results show that an average of 9.3% performance improvement was obtained for all 14 benchmarks. Meanwhile the total power consumption does not increase. The maximal performance improvement was up to 13.1% from dedup benchmark.Our experiments validate the effectiveness of our holistic energy-efficient model and technology.
Component overclocking is an effective approach to speed up the components of a system to realize a higher program performance; it includes processor overclocking or memory overclocking. However, overclocking will unavoidably result in increase in power consumption. Our goal is to optimally improve the performance of scientific computing applications without increasing the total power consumption for a processor-memory system. We built a processor-memory energy efficiency model for multicore-based systems, which coordinates the performance and power of processor and memory. Our model exploits performance boost opportunities for a processor-memory system by adopting processor overclocking, processor Dynamic Voltage and Frequency Scaling(DVFS), memory active ratio adjustment, and memory overclocking, according to different scientific applications.This model also provides a total power control method by considering the same four factors mentioned above. We propose a processor and memory Coordination-based holistic Energy-Efficient(CEE) algorithm, which achieves performance improvement without increasing the total power consumption. The experimental results show that an average of 9.3% performance improvement was obtained for all 14 benchmarks. Meanwhile the total power consumption does not increase. The maximal performance improvement was up to 13.1% from dedup benchmark.Our experiments validate the effectiveness of our holistic energy-efficient model and technology.

引文

[1]H.B.Jang,J.Lee,J.Kong,T.Suh,and S.W.Chung,Leveraging process variation for performance and energy:In the perspective of overclocking,IEEE Transactions on Computers,vol.63,no.5,pp.1316-1322,2014.
    [2]A.Subcommittee,Top ten exascale research challenges,Report,US Department Of Energy,USA,2014.
    [3]W.Wang,A.Porterfield,J.Cavazos,and S.Bhalachandra,Using per-loop CPU clock modulation for energy efficiency in openmp applications,in Proc.44th Int.International Conference Parallel Processing,Beijing,China,2015,pp.629-638.
    [4]L.Tan,S.L.Song,P.Wu,Z.Chen,R.Ge,and D.J.Kerbyson,Investigating the interplay between energy efficiency and resilience in high performance computing,in Proc.29th Int.Parallel and Distributed Processing Symposium,Hyderabad,India,2015,pp.786-796.
    [5]S.Rivoire,M.A.Shah,P.Ranganathan,and C.Kozyrakis,Joulesort:A balanced energy-efficiency benchmark,in Proc.26th Int.Special Interest Group On Management of Data,Beijing,China,2007,pp.365-376.
    [6]A.Rasmussen,G.Porter,M.Conley,H.V.Madhyastha,R.N.Mysore,A.Pucher,and A.Vahdat,Tritonsort:A balanced large-scale sorting system,in Proc.8th Int.Usenix Conference on Networked Systems Design&Implementation,Boston,MA,USA,2011,pp.1-28.
    [7]D.G.Andersen,J.Franklin,M.Kaminsky,A.Phanishayee,L.Tan,and V.Vasudevan,Fawn:A fast array of wimpy nodes,in Proc.22nd Int.Acm Symposium on Operating Systems Principles,Montana,MT,USA,2009,pp.1-14.
    [8]A.Tiwari,M.Schulz,and L.Carrington,Predicting optimal power allocation for cpu and dram domains,in Proc.29th Int.Parallel and Distributed Processing Symposium Workshop(IPDPSW),Hyderabad,India,2015,pp.951-959.
    [9]H.Zhang and H.Hoffmann,Maximizing performance under a power cap:A comparison of hardware,software,and hybrid techniques,ACM SIGPLAN Notices,vol.51,no.4,pp.545-559,2016.
    [10]R.Ge,X.Feng,Y.He,and P.Zou,The case for cross-component power coordination on power bounded systems,in Proc.45th Int.International Conference on Parallel Processing(ICPP),Philadelphia,PA,USA,2016,pp.516-525.
    [11]M.Chen,X.Wang,and X.Li,Coordinating processor and main memory for efficientserver power control,in Proc.25th Int.International Conference on Supercomputing(ICS),Arizona,AZ,USA,pp.130-140.
    [12]Q.Deng,D.Meisner,A.Bhattacharjee,T.F.Wenisch,and R.Bianchini,CoScale:Coordinating CPU and memory system DVFS in server systems,in Proc.45th Int.International Symposium on Microarchitecture(MICRO),Canada,2012,pp.143-154.
    [13]J.Rubio,K.Rajamani,F.Rawson,H.Hanson,S.Ghiasi,and T.Keller,Dynamic processor overclocking for improving performance of power-constrained systems,Report,IBM,2005.
    [14]A.D.M.Akhshabi1,Overclocking of CPU and graphics cards cooling refrigerator models offer the xtreme(permanent use)in order to increase efficiency,Bulletin of Applied and Research Science,vol.3,no.3,pp.44-50,2013.
    [15]C.Bienia,S.Kumar,J.P.Singh,and K.Li,The parsec benchmark suite:Characterization and architectural implications,in Proc.17th Int.International Conference on Parallel Architectures and Compilation Techniques,Raleigh,NC,USA,2008,pp.72-81.
    [16]P.R.Luszczek,D.H.Bailey,J.J.Dongarra,J.Kepner,R.F.Lucas,R.Rabenseifner,and D.Takahashi,The HPCchallenge(HPCC)benchmark suite,in Proc.19th Int.ACM/IEEE Conference on Supercomputing,Tampa,SF,USA,2006,pp.213-213.
    [17]Intel 64 and IA-32 Architectures Software Developers Manual,Intel Corporation,2014.
    [18]D.James,How to overclock:It’s easier than you think,https://www.pcgamesn.com/hardware-guides/overclockingguide-how-to-overclock,2017.
    [19]S.Moment,DDR4 RAM overclocking 101 guide,http://www.overclockers.com/forums/showthread.php/785102-DDR4-RAM-overclocking-101-guide,2017.
    [20]D.Lo and C.Kozyrakis,Dynamic management of turbomode in modern multi-core chips,in Proc.20th Int.High Performance Computer Architecture(HPCA),Florida,FL,USA,2014,pp.603-613.
    [21]Intel vtune amplifier,https://software.intel.com/en-us/intel-vtune-amplifier-xe,2017.
    [22]M.Dimitrov,Intel power governor,https://software.intel.com/en-us/articles/intel-power-governor,2012.
    [23]V.Viswanathan,Intel Memory Latency Checker v3.4,https://software.intel.com/en-us/articles/intelr-memorylatency-checker,2017.
    [24]C.Lefurgy,X.Wang,and M.Ware,Power capping:Aprelude to power shifting,Cluster Computing,vol.11,no.2,pp.183-195,2008.
    [25]R.Raghavendra,P.Ranganathan,V.Talwar,Z.Wang,and X.Zhu,No power struggles:Coordinated multilevel power management for the data center,in Proc.13rd Int.International Conference on Architectural Support for Programming Languages and Operating Systems,Seattle,WA,USA,2008,pp.48-59.
    [26]X.Yang,Y.Zhang,X.Lu,J.Xue,I.Rogers,G.Li,G.Wang,and X.Fang,Exploiting the reuse supplied by loop-dependent stream references for stream processors,ACM Transactions on Architecture and Code Optimization,vol.7,no.2,pp.1-35,2010.
    [27]X.Yang,Z.Wang,J.Xue,and Y.Zhou,The reliability wall for exascale supercomputing,IEEE Transactions on Computers,vol.61,no.6,pp.767-779,2012.
    [28]B.Rountree,D.K.Lownenthal,B.R.de Supinski,M.Schulz,V.W.Freeh,and T.Bletsch,Adagio:Making DVS practical for complex HPC applications,in Proc.23rd Int.International Conference on Supercomputing,Yorktown Heights,NY,USA,2009,pp.460-469.
    [29]S.Bhalachandra,A.Porterfield,S.L.Olivier,and J.F.Prins,An adaptive core-specific runtime for energy efficiency,in Proc.31s Int.IEEE International Parallel and Distributed Processing Symposium,Florida,FL,USA,2017,pp.947-956.
    [30]A.Marathe,P.E.Bailey,D.K.Lowenthal,B.Rountree,M.Schulz,and B.R.de Supinski,A run-time system for power-constrained hpc applications,in Proc.31s Int.High Performance Computing,Bengaluru,Indian,2015,pp.394-408.
    [31]I.Stamelakos,S.Xydis,G.Palermo,and C.Silvano,Variation-aware voltage island formation for power efficient near-threshold manycore architectures,in Proc.19th Int.Asia and South Pacific Design Automation Conference,Singapore,2014,pp.304-310.
    [32]U.R.Karpuzcu,A.Sinkar,N.S.Kim,and J.Torrellas,Energysmart:Toward energy-efficient manycores for nearthreshold computing,in Proc.19th Int.High Performance Computer Architecture,Shenzhen,China,2013,pp.542-553.
    [33]R.Begum,D.Werner,M.Hempstead,G.Prasad,and G.Challen,Energy-performance trade-offs on energyconstrained devices with multi-component DVFS,in Proc.10th Int.International Symposium on Workload Characterization,Georgia,GA,USA,2015,pp.34-43.
    [34]S.Mittal,A survey of architectural techniques for DRAM power management,International Journal of High Performance Systems Architecture,vol.4,no.2,pp.110-119,2012.
    [35]Q.Liu,M.Moreto,J.Abella,F.J.Cazorla,and M.Valero,Dream:Per-task DRAM energy metering in multicore systems,in Proc.20th Int.European Conference on Parallel Processing,Porto,Portugal,2014,pp.111-123.
    [36]Q.Deng,Active low-power modes for main memory with memscale,IEEE Micro,vol.32,no.3,pp.62-69,2012.
    [37]P.Zou,T.Allen,C.H.Davis IV,X.Feng,and R.Ge,Clip:Cluster-level intelligent power coordination for powerbounded systems,in Proc.20th Int.Cluster Computing,Hawaii,HI,USA,2017,pp.541-551.
    [38]R.Ge,P.Zou,and X.Feng,Application-aware power coordination on power bounded NUMA multicore systems,in Proc.46th Int.International Conference on Parallel Processing,Briston,UK,2017,pp.591-600.
    [39]B.Acun and L.V.Kale,Mitigating processor variation through dynamic load balancings,in Proc.30th Int.International Parallel and Distributed Processing Symposium Workshops,Chicago,IL,USA,2016,pp.1073-1076.
    [40]T.Patki,D.K.Lowenthal,B.Rountree,M.Schulz,and B.R.de Supinski,Exploring hardware overprovisioning in power-constrained,high performance computing,in Proc.27th Int.International Conference on Supercomputing,Eugene,OR,USA,2013,pp.173-182.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700