用户名: 密码: 验证码:
基于程序访存模式的存储系统节能技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
半导体技术的进步和应用对计算能力的需求推动着计算机体系结构的发展。计算机系统计算的数据和指令皆来自于存储系统,存储系统在计算性能中扮演了极其重要的角色。作为对不断提升的计算能力的配合和支持,现有的存储层次系统,无论是片上缓存还是片外内存,其组织结构变得越来越复杂,其容量也变得越来越大。导致的结果是存储层次系统的能耗在整个计算机系统中所占的份额也越来越大。能耗问题会成为目前计算机系统设计中的关键因素。因此如河降低存储层次的功耗,尤其是片上缓存和片外内存的能耗,成为研究者普遍关心的一个重要问题。
     目前存储系统低功耗研究主要焦点在缓存和内存,因为两者能耗在整个存储层次系统中最显著。当前的内存节能技术为充分发挥器件自带的硬件节能支持,包括内存分块和低功耗状态两种技术,通过软硬件结合的技术节能。另外,新型非易失性存储器由于其静态能耗低、密度大等特点,被考虑作为低功耗存储系统的实现。例如现有的PCM/DRAM混合内存系统就是整合两种器件优点的设计,以及基于STT-RAM的非易失性缓存。上述技术都关注于器件能力的挖掘,而忽视了器件对程序行为的适配,因此不能完美发挥软硬件协同的优势。
     本文从程序访存行为入手,分析了访存模式的三方面因素:应用程序对内存块的访问行为、应用程序对缓存块的写行为以及应用程序中物理页的写行为。基于这些访存模式,结合现有的存储低功耗技术,本文提出了基于程序访存模式的存储系统节能技术,主要从以下三个方面进行了研究:
     1、本文通过追踪程序访问的物理贞的地址,建立了程序数据到内存分块的映射模型;基于该模型,本文抽象出程序执行顺序对于内存块节能的问题,并且采用网络流技术对该问题进行解答;进一步考察了多核上来自不同处理器的程序运行对内存块节能的影响,对该问题进行抽象建模;对于双核系统,本文提出该问题的最优算法解;当其中的处理器数目超过两个时,该问题属于NP难问题,本文提出两个相应的启发式算法。
     2、本文研究了程序对缓存的写访问,发现对缓存块的写操作具有集中性;基于该模式,本文提出了选择性的写前读机制,对缓存中的脏字数据进行追踪和标示,利用这些标示信息在缓存块写回操作时只对缓存块中的脏区域触发写前读操作;对于多级STT-RAM缓存写能耗更大的特点,本文研究了数据在多级STT-RAM缓存本地进行写更新以及在SRAM中进行写更新的能耗的差别,提出了伴随写缓存结构和对应的机制,将能耗高的写更新操作迁移到伴随写缓存中,从而达到节能目的。
     3、本文研究了程序对物理页的写访问,发现对物理页的两种基本写模式;基于上述写模式,本文提出了具有可变迁移粒度的自适应数据迁移策略,可以识别出当前物理页的写模式,并且采用适合当前物理页的迁移粒度将PCM中的数据迁移至DRAM,减小迁移代价的同时,发挥DRAM写能耗低的特点;在上述的混合内存管理机制下,本文还提出了一种局部刷新的策略,其基本思想是利用迁移模块中的相关信息,对DRAM中的每个数据刷新单元中的数据有效性作标示,从而只对其中的有效数据进行刷新操作,降低数据刷新能耗。
     最后,本文基于实际的硬件平台PandaBoard和全系统模拟器gem5以及其他的工具如SESC和CACTI,采用实际测量和软件模拟相结合的实验方法,对上述提出的创新和技术进行了实验验证和测试。实验结果表明本文中提出的机制和结构可以比现有的技术进一步降低缓存和内存的写能耗和静态能耗。
     本文通过对程序访存模式,具体包括应用程序对内存块的访问行为、应用程序对缓存块的写行为以及应用程序中物理页的写行为的研究,结合现有存储器件的节能硬件支持,实现了访存模式指导下对现有存储器件节能潜力的再挖掘,实验结果证实结合软件访存模式的存储节能技术有很好的节能效果。
The development of computer architecture is driven by the progress of semi-conductor technology and the computing power requirement of the applications. The memory system plays a key role in the computer architecture since it provides instructions and data to the processors. To keep in pace with the speed of the processor, the memory hierarchy, both the on-chip cache and the off-chip memory, become more complex and larger. Thus the memory hierarchy consumes a large portion of the system energy. Since the energy issue becomes a major constraint in current computer system design, the energy consumption of memory system demands a great amount of attention.
     Most current researches focus on the on-chip cache and the main memory since these two are the most energy-hungry. The manufactory provides memory banking and low-power state for the main memory to save energy. Besides, the non-volatile memory has advantages such as low leakage power and high density. To combine the strengths of non-volatile memory and DRAM, the hybrid PCM/DRAM memory has been proposed. And the non-volatile memory, STT-RAM, has been adopted for energy-efficient cache memory design.
     In this dissertation, we studied the memory access patterns of applications, including how the application accesses the physical pages in the main memory, how the processor writes the cache block and how the application writes the physical pages. Based on those access patterns and the existing works, we make the following contributions:
     1.Firstly, we propose an OS-based methodology to reduce the energy overhead caused by memory mode transitions. In particular, we propose an algorithm to optimize the executing order of processes to reduce the number of memory mode transitions as well as to create long idle intervals for energy saving. Secondly, we extend the work to the multi-core devices. In particular, the processes are scheduled based on their memory access characteristics to maximize the number of the memory banks being in low power mode. A fast approximation algorithm and two heuristic algorithms are proposed.
     2. Firstly, we studied the pattern of the write accesses from CPU to cache and then propose a Selective Read-Before-Write (SRW) scheme to further reduce the dynamic write energy of the STT-RAM cache. Additional optimizations are included in the design of SRW so that it can save a considerable amount of energy at negligible overheads. To address the write energy of multi-lelve STT-RAM, we propose Companion Write Cache(CWC), which is a small fully associative SRAM cache, to absorb the energy-consuming write updates from the MLC STT-RAM cache.
     3. Firstly, we studied the write access pattern within each physical page and found two typical write patterns. Then we propose an adaptive data migration strategy for hybrid PCM/DRAM main memory. The basic idea is to detect the write pattern of the physical page and then migrate the write-hot data of the page using different granularities instead of whole-page granularity alone. Secondly, we propose a partial data refresh management to reduce the energy consumption of DRAM. Based on the data migration information, the refresh controller of the DRAM is modified so that it is aware of the valid data within its refresh unit and then only performs data refresh operation over those valid data to save energy.
     At last, we evaluated the proposed strategies and technique with practical platform such as PandaBoard, full system simulator like Gem5, SESC, CACTI and so on. Experimental results show that the proposed techniques and organizations help to save more energy of the memory system compared to the existing schemes.
     In this dissertation, we study and observe the access patterns of applications, including how the applications access the memory banks, how the applications write the cache block and how the applications write the physical page. Based on these found patterns, we propose novel techniques and organizations to exploit the energy-saving potential of the existing hardware. The experimental results demonstrate that our proposals can help to save more energy compared to the existing techniques.
引文
[1]John L. Hennessy, David A. Patterson. Computer Architecture:A Quntitative Approach[B]. Morgan Kauffman,2000.
    [2]Venkatachalam Vasanth, Franz Michael. Power reduction techniques for microprocessor systems[J]. ACM Computing Survey,2005,37(3):195-237.
    [3]Kim Nam Sung, Flautner Krisztian, Blaauw David, Mudge Trevor. Circuit and microarchitectural techniques for reducing cache leakage power[J]. IEEE Transaction on Very Large Scale Integrated System,2004,12(1):167-184.
    [4]Howard J., Dighe S., Hoskote Y., Vangal S., Finan D., et al. A 48-Core IA-32 Message-Passing Processor with DVFS in 45nm CMOS[C]. Proceedings of the 2010 IEEE International Solid-State Circuits Conference, Washiton, USA, 2012:108-121.
    [5]Wilkerson Chris, Alameldeen Alaa R, Chishti, Zeshan, Wu Wei, Somasekhar Dinesh, Lu Shih-lien. Reducing cache power with low-cost, multi-bit error-correcting codes[J]. ACM SIGARCH Computer Architecture News,2010, 38(3):83-93.
    [6]Carroll, GHeiser. An analysis of power consumption in a smartphone[C]. Proceedings of the 10th Annual USENIX Technical Conference, Berkerly, USA, 2010:21-30.
    [7]M.A.Viredaz and D.A.Wallach. Power evaluation of a handheld computer[J]. IEEE Microarchitecutre,2003,23(3):66-74.
    [8]ITRS 2011 Edition. http://www.itrs.net/Links/2011ITRS/Home2011.htm
    [9]Lefurgy Charles, Rajamani Karthick and Rawson Freeman. Energy management for commercial servers[J]. Jounal of Computer,2003,36(2):39-48.
    [10]V.D.L.Luz, M.Kandemir, I.Kolcu. Automatic data migration for reducing energy consumption in multi-bank memory systems[C]. Proceedings of the 39th Annual Symp. Design Automation Conference, New York, USA,2002:213-218.
    [11]C.GLyuh, T.Kim. Memory access scheduling and binding considering energy minimization in multi-bank memory systems [C]. Proceedings of the 41st Annual Conf. Design Automation Conference, New York, USA,2004:81-86.
    [12]Zhang Lei, Qiu Meikang, Sha Edwin, Zhuge Qingfeng. Variable assignment and instruction scheduling for processor with multi-module memory[J]. Journal of Microprocessor and Microsystem,2011,35(3):308-317.
    [13]Qiu Meikang, Guo Minyi, Liu Meiqin, Xue Chun Jason, et al. Loop scheduling and bank type assignment for heterogeneous multi-bank memory[J]. Journal of Parallel Distrib. Comput.,2009,69(6):546-558.
    [14]Diao Zhitao, Pakala Mahendra, Panchula Alex, et al. Spin-transfer switching in MgO-based magnetic tunnel junctions[J]. Journal of Applied Physics,2006, 99(1):213-223.
    [15]Qin Huifang, Cao Yu, Markovic Dejan, Vladimirescu Andrei, Rabaey Jan. SRAM Leakage Suppression by Minimizing Standby Supply Voltage[C]. Proceedings of the 5th International Symposium on Quality Electronic Design, Washington, USA,2004:55-66.
    [16]Yang Se-Hyun, Falsafi Babak, Powell Michael D., Roy Kaushik, Vijaykumar T. N. An Integrated Circuit/Architecture Approach to Reducing Leakage in Deep-Submicron High-Performance I-Caches[C]. Proceedings of the 7th International Symposium on High-Performance Computer Architecture, Washington, USA,2001:147-158.
    [17]S.Kirolos, Y. Massoud. Adaptive sram design for dynamic voltage scaling VLSI systems[C]. Proceedings of the 2007 Midwest Symp. On Circuits and Systesms, 2007:1297-1300.
    [18]S P.Nair, S. Eratne. A quasi-power-gated low-leakage stable sram cell[C]. Proceedings of the 2010 Midwest Symp. On Circuits and Systems,2010: 761-764.
    [19]C. Kim, J.Kim, S. Mukhopadhyay. A forward body-biased low-leakage sram cache:device, circuit and architecture considerations[J]. IEEE Transactions on VLSI Systems,2005,13(3):349-357.
    [20]Xue Chun Jason, Zhang Youtao, Chen Yiran, et al. Emerging non-volatile memories:opportunities and challenges[C]. Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, Taipei, Taiwan,2011:325-334.
    [21]G.Sun, Y.Zhang, Y.Wang, Y.Chen. Improving energy efficiency of write-asymmetric memories by log style write[C]. Proceedings of the 18th Annual Symp. on Low power electronics and design, New York,2012:173-178.
    [22]J.Chen, Ron.C.Chiang, H.H.Huang, G.Venkataramani. Energy-aware writes to non-volatile main memory[J]. ACM SIGOPS Operating Systems Review,2011, 45(2):48-52.
    [23]Zhang Wangyuan, Li Tao. Exploring Phase Change Memory and 3D Die-Stacking for Power/Thermal Friendly, Fast and Durable Memory Architectures[C]. Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques, Washinton, USA, 2009:101-112.
    [24]M. Hosomi, H.Yamagishi, et al. A novel nonvolatile memory with spin torque transfer magnetization switching:Spin-RAM[C]. Proceedings of 2005 IEEE International IEDM Technical Digest electron Devices Meeting,2005:459-462.
    [25]Driskill-Smith. Latest advances and future prospects of STT-RAM, Presentaed at Non-Volatile Memories Workshop, http://nvmw.ucsd.edu/2010/documents/DriskillSmith_Alexander.pdf,2010.
    [26]Jadidi, M.Arjom, H.Sarbazi-Azad. High-Endurance and Performance-Efficient Design of Hybrid Cache Architectures through Adaptive Line Replacement[C]. Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design,2011:79-84.
    [27]X.Wu et al. Hybrid Cache Architecture with Disparate Memory Technologies[J]. ACM SIGARCH Computer Architecture News,2009,37(1):34-45.
    [28]Jianhua Li, Xue C.J., Yinlong Xu. STT-RAM based energy-efficiency hybrid cache for CMPs[C]. Proceedings of the 19th International Conference on VLSI and System-on-Chip (VLSI-SoC),2011:31-36.
    [29]Byung-Min Lee, Gi-Ho Park. Performance and energy-efficiency analysis of hybrid cache memory based on SRAM-MRAM[C]. Proceedings of the 2012 International SoC Design Conference,2012:247-250
    [30]Liu Tiantian, Zhao Yingchao, Xue Chun Jason, Li Minming. Power-aware variable partitioning for DSPs with hybrid PRAM and DRAM main memory[C]. Proceedings of the 48th Design Automation Conference, New York, USA, 2011:405-410.
    [31]Seok Hyunchul, Park Youngwoo, Park Kyu Ho. Migration based page caching algorithm for a hybrid main memory of DRAM and PRAM[C]. Proceedings of the 2011 ACM Symposium on Applied Computing, New York, USA, 2011:595-599.
    [32]Tian Wanyong, Li Jianhua, Zhao Yingchao, Xue Chun Jason, et al. Optimal task allocation on non-volatile memory based hybrid main memory[C]. Proceedings of the 2011 ACM Symposium on Research in Applied Computation, Florida, USA,2011:1-6.
    [33]Y.Joo, D.Niu, X.Dong, G.Sun, N.Chang, Y.Xie. Energy- and endurance-aware design of phase change memory caches[C]. Proceedings of the 13th International Conference on Design, Automation and Test in Europe, Belgium,2010: 136-141.
    [34]Diniz, D.Guedes, W.Meira, R.Bianchini. Limiting the power consumption of main memory[C]. Proceedings of the 34th Annual Symposium on Computer Architecture, New York, USA,2007:290-301.
    [35]Mittal Sparsh. A survey of architectural techniques for DRAM power management[J]. Journal of High Performance System Architecture,4(2), 2012:110-119.
    [36]Wu Donghong, He Bingsheng, Tang Xueyan, Xu Jianliang, Guo Minyi. RAMZzz:rank-aware dram power management with dynamic migrations and demotions[C]. Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, Utah, USA,2012:32-43.
    [37]Deng Qingyuan, Meisner David, Ramos Luiz, Wenisch Thomas F., Bianchini. MemScale:active low-power modes for main memory[J]. SIGPLAN Note,46(3), 2011:225-238.
    [38]A.N.Udipi, N.Muralimanohar, N.Chatterjee, R.Balasubramonian, A.Davisand, N.P.Jouppi. Rethinking DRAM design and organization for energy-constrained multi-cores[C]:Proceedings of the 37th Annual Symp. Computer Architecture, New York, USA,2010:175-186.
    [39]Ozturk, GChen, M.Kandemir, M.Karakoy. Cache miss clustering for banked memory systems[C]. Proceedings of the 2006 IEEE/ACM International Conference on Computer-aided design, California, USA,2006:244-250.
    [40]A.M.Amin, Z.A.Chishti. Rank-aware cache replacement and write buffering to improve DRAM energy efficiency[C]. Proceedings of the 16th International Symp. On Low power electronics and design, Austin Texas, USA, 2010:383-388.
    [41]H.Koc, O.Ozturk, M.Kandemir, E.Ercanli. Minimizing Energy Consumption of Banked Memories Using Data Recomputation[C]. Prococeedings of the 2006 International Symp. on Low Power Electronics and Design, New York, USA, 2006:358-361.
    [42]H.Huang, K.G.Shin, C.Lefurgy, T.Keller. Improving energy efficiency by making DRAM less randomly accessed[C]. Proceedings of the 2005 International Symp. on Low power electronics and design, San Diego, USA,2005:393-398.
    [43]Min Jung-Hi, Cha Hojung, Srini Vason P. Dynamic power management of DRAM using accessed physical addresses[J]. Journal of Microprocessor and Microsystem,2007,31(1):15-24.
    [44]Huang H., Shin K.G., Lefurgy C., et al. Software and hardware cooperative power management for main memory[C]. Proceedings of the 4th international conference on Power-Aware Computer Systems, Portland, OR,2005:61-77.
    [45]Wang Zhong, Hu Xiaobo. Power Aware Variable Partitioning and Instruction Scheduling for Multiple Memory Banks[C]. Proceedings of the conference on Design, automation and test in Europe, Washiton,, USA,2004:234-239.
    [46]V.Delaluz, A.Sivasubramaniam, M.Kandemir, N.Vijaykrishnan, M.J. Irwin. Scheuler-based DRAM energy management[C]. Proceedings of the 39th Annual Symp. Design Automation Conference, New York, USA,2002:697-702.
    [47]Zheng Hongzhong, Lin Jiang, Zhang Zhao, Zhu Zhichun. Decoupled DIMM: building high-bandwidth memory system using low-speed DRAM devices[J]. SIGARCH Computor Architecture News,2009,37(3):255-266.
    [48]Vogelsang Thomas. Understanding the Energy Consumption of Dynamic Random Access Memories[C]. Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, Washington, USA, 2010:363-374.
    [49]Zhang Guangfei, Wang Huandong, Chen Xinke, Huang Shuai, Li Peng. Heterogeneous multi-channel:fine-grained DRAM control for both system performance and power efficiency[C]. Proceedings of the 49th Annual Design Automation Conference, California, USA,2012:876-881.
    [50]Gulur N. D., Manikantan R., Mehendale Mahesh, Govindarajan, R., Multiple. sub-row buffers in DRAM:unlocking performance and energy improvement opportunities[C]. Proceedings of the 26th ACM international conference on Supercomputing, Venice, Italy,2012:257-266.
    [51]Zheng Hongzhong, Lin Jiang, Zhang Zhao, Gorbatov Eugene, David Howard, Zhu Zhichun. Mini-rank:Adaptive DRAM architecture for improving memory power efficiency[C]. Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture, Washington, USA,2008:210-221.
    [52]Liu Song, Pattabiraman Karthik, Moscibroda Thomas, Zorn Benjamin G. Flikker: saving DRAM refresh-power through critical data partisioning[J]. ACM SIGARCH Computer Architecture News,2011,39(1):213-224.
    [53]Pattabiraman Karthik, Grover Vinod, Zorn Benjamin G Samurai:protecting critical data in unsafe languages[C]. Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems, Glasgow, Scotland UK,2008:219-232.
    [54]Carbin Michael, Rinard Martin C. Automatically identifying critical input regions and code in applications[C]. Proceedings of the 19th international symposium on Software testing and analysis, Trento, Italy,2010:37-48.
    [551 Liu Jamie, Jaiyen Ben, Veras Richard, Mutlu Onur. RAIDR:Retention-Aware Intelligent DRAM Refresh[C]. Proceedings of the 39th Annual International Symposium on Computer Architecture, Portland, Oregon,2012:1-12.
    [56]Alizadeh Mohammad, Javanmard Adel, Chuang Shang-Tse, et al. Versatile refresh:low complexity refresh scheduling for high-throughput multi-banked eDRAM[J]. SIGMETRICS Perform. Eval. Rev.,40(1),2012:247-258.
    [57]Venkatesan R.K., Herr S., Rotenberg E. Retention-Aware Placement in DRAM(RAPID):Software Methods for Quasi-Non-Volatile DRAM[C]. Proceedings of The Twelfth International Symposium on High-Performance Computer Architecture, Florida, USA,2006:155-165.
    [58]Z.Sun, X.Bi, H.H.Li, W.F.Wong, Z.L.Ong, X.Zhu, W. Wu. Multi retention level STT-RAM cache designs with a dynamic refresh scheme[C]. Proceedings of the 44th Annual Symp. on Microarchitecture, New York,2011:329-338.
    [59]Ghosh Mrinmoy, Lee Hsien-Hsin S. Smart Refresh:An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs[C]. Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, Washington, USA,2007:134-145.
    [60]Joohee Kim, Papaefthymiou M.C. Block-based multi-period refresh for energy efficient dynamic memory[C]. Proceedings of the 14th Annual IEEE International ASIC/SOC Conference, Washiton, USA,2001:193-197.
    [61]Patel K., Benini L., Macii Enrico, Poncino Massimo. Energy-Efficient value-based selective refresh for embedded DRAMs[C]. Proceedings of the 15th international conference on Integrated Circuit and System Design:power and Timing Modeling, Optimization and Simulation, Berlin, Heidelberg, 2005:466-476.
    [62]X.Dong, X.Wu, G.Sun, Y.Xie, H.Li, Y.Chen. Circuit and Microarchitecture Evaluation of 3D Stacking Magnetic RAM (MRAM) as a Universal Memory Replacement[C]. Proceedings of the 45th Design Automation Conference, 2008:554-559.
    [63]F. Tabrizi. The Future of scalable STT-RAM as a universal embedded memory, Embedded.com, February 2007.
    [64]Clinton Wills, Vidyabhushan Mohan, Anurag Nigam, Sudhanva Gurumurthi, Mircea R. Stan. Relaxing Non-Volatility for Fast and Energy-Efficient STT-RAM Caches[C]. Proceedings of the 2011 High Performance Computer Architecture, Berkerly, USA,2011:50-61.
    [65]P.Zhou, B.Zhao, J.Yang, Y.Zhang, Energy reduction for STT-RAM using early write termination[C]. Proceedings of the 18th International Conference Computer-Aided Design, New York,2009:264-268.
    [66]Nigam Anurag, Clinton W., Mohan Vidyabhushan, Chen Eugene, Gurumurthi Sudhanva, Stan Mircea R. Delivering on the promise of universal memory for spin-transfer torque ram(STT-RAM)[C]. Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design, Fukuoka, Japan, 2011:121-126.
    [67]Jadidi Amin, Arjomand Mohammad, Sarbazi-Azad Hamid. High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement[C]. Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design, Fukuoka, Japan,2011:79-84.
    [68]Wu Xiaoxia, Li Jian, Zhang Lixin, et al. Power and performance of read-write aware hybrid caches with non-volatile memories[C]. Proceedings of the Conference on Design, Automation and Test in Europe, Nice, France, 2009:737-742.
    [69]Wu Xiaoxia, Li Jian, Zhang Lixin, et al. Hybrid cache architecture with disparate memory technologies [C]. Proceedings of the 36th annual international symposium on Computer architecture, TX, USA,2009:34-45.
    [70]Chen Yu-Ting, Cong Jason, Huang Hui, et al. Static and dynamic co-optimizations for blocks mapping in hybrid caches[C]. Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design, California, USA,2012:237-242.
    [71]Li Qingan, Li Jianhua, Shi Liang, et al. MAC:migration-aware compilation for STT-RAM based hybrid cache in embedded systems[C]. Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design, California, USA,2012:351-356.
    [72]Li Qingan, Zhao Mengying, Xue Chun Jason, He Yanxiang. Compiler-assisted preferred caching for embedded systems with STT-RAM based hybrid cache[J]. SIGPLAN Not.,2012,47(5):109-118.
    [73]Sun Guangyu, Yang Huazhong, Xie Yuan. Performance/Thermal-Aware Design of 3D-Stacked L2 Caches for CMPs[J]. ACM Trans. Des. Autom. Electron. Syst., 2012,17(2):13-32.
    [74]Xie Yuan, Loh Gabriel H., Black Bryan, Bernstein Kerry. Design space exploration for 3D architectures[J]. Jourrnal of Emerging Technology Computing System,2006,2(2):65-103.
    [75]Sun Guangyu, Wu Xiaoxia, Xie Yuan. Exploration of 3D stacked L2 cache design for high performance and efficient thermal control[C]. Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design, New York, USA,2009:295-298.
    [76]Li Feihui, Nicopoulos Chrysostomos, Richardson, et al. Design and Management of 3D Chip Multiprocessors Using Network-in-Memory[J]. SIGARCH Comput. Archit. News,,2006,34(2):130-141.
    [77]X.Dong, X.Wu, G.Sun, Y.Xie, H.Li, Y.Chen. A Novel Architecture of the 3D Stacked MRAM L2 Cache for CMPs[C]. Proceedings of the 15th International Symposium on High Performance Computer Architecture,2009:239-249.
    [73]Jingtong Hu, Xue C.J., Wei-Che Tseng, He Y., Meikang Qiu, Sha E.H.-M. Reducing write activities on non-volatile memories in embedded CMPs via data migration and recomputation[C]. Proceedings of the 47th Design Automation Conference, New York, USA,2010:350-355.
    [79]T. Ishigaki, T. Kawahara, R. Takemura, K. Ono, et al. A Multi-level-cell Spin-transfer Torque Memory with Series-stacked Magnetotunnel Junctions[C]. Porceedings of the 2010 Symposium on VLSI Technology, New York, USA, 2010:47-48.
    [80]Yaojun Zhang, Lu Zhang, Wujie Wen, Guangyu Sun, Yiran Chen. Multi-level cell STT-RAM:Is it realistic or just a dream?[C]. Porceedings of the 2012 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), New York, USA,2012:526-532.
    [81]J.-H. Park, et al.8 Gb MLC (multi-level cell) NAND flash memory using 63 nm process technology[C]. Proceedings of the 2004 IEEE Int'l Electron Devices Meeting, New York, USA,2004:873-876.
    [82]Y.Chen, X.Wang, W.Zhu, H.Li, Z.Sun, G.Sun, Y. Xie. Access scheme of Multi-Level Cell Spin-Transfer Torque Random Access Memory and its optimization[C].2010 53rd IEEE International Midwest Symposium on Circuits and Systems (MWSCAS),2010:1109-1112.
    [83]Y.Chen, W.F.Wong, H.Li, C.K.Koh. Processor caches built using multi-level spin-transfer torque RAM cells[C].2011 International Symposium on Low Power Electronics and Design,2011:73-78.
    [84]L.Jiang, B.Zhao, Y.Zhang, J.Yang. Constructing large and fast multi-level cell STT-MRAM based cache for embedded processors [C]. Pceedings of the 49th Annual Design Automation Conference, New York, USA,2012:907-912.
    [85]J.-T. Lin, Y.-B. Liao, M.-H. Chiang, W.-C. Hsu. Operation of multi-level phase change memory using various programming techniques[C]. Proceedings of the 2009 IEEE International Conference on IC Design and Technology, New York, USA,2009:199-202.
    [86]Bedeschi, F., Fackenthal R., Resta C., et al. A Multi-Level-Cell Bipolar-Selected Phase-Change Memory[C]. Proceedings of the IEEE International Solid-State Circuits Conference, New York, USA,2008:428-625.
    [87]B.C.Lee, E.Ipek, O.Mutlu, D.Burger, Architecting phase change memory as a scalable DRAM alternative[J]. SIGARCH Computer Architecture News, New York, USA,2009,37(4):2-13.
    [88]M. Qureshi, M. Franceschini, L. Lastras-Montano. Improving read performance of phase change memories via write cancellation and write pausing[C]. Proceedings of the High-Performance Computer Architecture, New York, USA, 2010:1-11.
    [89]Qureshi Moinuddin K.m, Franceschini Michele M., Jagmohan Ashish, Lastras Luis A. PreSET:improving performance of phase change memories by exploiting asymmetry in write times[J]. SIGARCH Comput. Archit. News,2012, 40(3):380-391.
    [90]Zhang Xi, Hu Qian, Wang Dongsheng, Li Chongmin, Wang Haixia. A read-write aware replacement policy for phase change memory[C]. Proceedings of the 9th international conference on Advanced parallel processing technologies, Shanghai, China,2011:31-45.
    [91]Wei Xu, Tong Zhang. A Time-Aware Fault Tolerance Scheme to Improve Reliability of Multilevel Phase-Change Memory in the Presence of Significant Resistance Drift[J]. IEEE Transactions on Very Large Scale Integration Systems, 2011,19(8):1357-1367.
    [92]Lee Suyoun, Jeong Jeung-hyun, Lee Taek Sung, et al. A Study on the Failure Mechanism of a Phase-Change Memory in Write/Erase Cycling[J]. IEEE Electron Device Letters,2009,30(5):448-450.
    [93]Braga S., Cabrini A., Torelli G. Experimental Analysis of Partial-SET State Stability in Phase-Change Memories[J]. IEEE Transactions on Electron Devices, 2011,58(2):517-522.
    [94]Ferreira Alexandre P., Zhou Miao, Bock Santiago, et al. Increasing PCM main memory lifetime[C]. Proceedings of the Conference on Design, Automation and Test in Europe, Dresden, Germany,2010:914-919.
    [95]Qureshi, Moinuddin K. Pay-As-You-Go:low-overhead hard-error correction for phase change memories[C]. Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, Porto Alegre, Brazil, 2011:318-328.
    [96]P.Zhou, B.Zhao, J.Yang, Y.Zhang. A durable and energy efficient main memeory using phase change memory technology[C]. Proceeding of the 36th Annual Symp. on Computer architecture, New York,2009:14-23.
    [97]S.Cho, H.Lee. Flip-N-Write:a simple deterministic technique to improve PRAM write performance, energy and endurance[C]. Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture,2009:347-357.
    [98]Lei Jiang, Youtao Zhang, Jun Yang. Enhancing phase change memory lifetime through fine-grained current regulation and voltage upscaling[C]. Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design, Fukuoka, Japan,2011:523-528.
    [99]Prasanth Mangalagiri, Karthik Sarpatwari, Aditya Yanamandra. A low-power phase change memory based hybrid cache architecture[C]. Proceedings of the 18th ACM Great Lakes symposium on VLSI, Florida, USA,2008:200-207.
    [100]Fackenthal R., Resta C., Donze E.M., Jagasivamani M., et al. A multi-level-cell bipolar-selected phase-change memory[C]. Proceedings of 2008 International Solid-State Circuits Conference, Florida, USA,2008:428-625.
    [101]Nirschl T., Philipp J.B., Happ T.D., Burr, G.W., et al. Write strategies for 2 and 4-bit multi-level phase-change memory[C]. Proceedings of the 2008 International Electron Devvices Meeting, Washinton, USA,2007:461-464.
    [102]Jiang Lei, Zhang Youtao, Yang Jun. ER:elastic RESET for low power and long endurance MLC based phase change memory[C]. Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design, California, USA,2012:39-44.
    [103]Dhiman Gaurav, Ayoub Raid, Rosing Tajana. PDRAM:a hybrid PRAM and DRAM main memory system[C]. Proceedings of the 46th Annual Design Automation Conference, California, USA,2009:664-669.
    [104]Qureshi Moinuddin K., Srinivasan Vijayalakshmi, Rivers Jude A. Scalable high performance main memory system using phase-change memory technology[C]. Proceedings of the 36th annual international symposium on Computer architecture, Austin, USA,2009:24-33.
    [105]Seok Hyunchul, Park Youngwoo, Park Kyu Ho. Migration based page caching algorithm for a hybrid main memory of DRAM and PRAM[C]. Proceedings of the 2011 ACM Symposium on Applied Computing, TaiChung, Taiwan, 2011:595-599.
    [106]Seok Hyunchul, Park Youngwoo, Park Ki-Woong, Park Kyu Ho. Efficient page caching algorithm with prediction and migration for a hybrid main memory[J]. SIGAPP Appl. Comput. Rev.,2011,11(4):38-48.
    [107]Ramos Luiz E., Gorbatov Eugene, Bianchini Ricardo. Page placement in hybrid memory systems[C]. Proceedings of the international conference on Supercomputing, Arizona, USA,2011:85-95.
    [108]Baek Seungcheol, Lee Hyung Gyu, Nicopoulos Chrysostomos, Kim Jongman. A Dual-Phase Compression Mechanism for Hybrid DRAM/PCM Main Memory Architectures[C]. Proceedings of the great lakes symposium on VLSI, Utah, USA,2012:345-350.
    [109]Du Yu, Zhou Miao, Childers Bruce, Melhem Rami, Moss{\'e} Daniel. Delta-compressed caching for overcoming the write bandwidth limitation of hybrid main memory[J]. ACM Transaction on Architecture and Code Optimization,2013,9(6):55-74.
    [110]Kharbutli Mazen, Solihin Yan. Counter-Based Cache Replacement and Bypassing Algorithms[J]. IEEE Transaction on Computers,2008, 57(4):433-447.
    [111]Lai An-Chow, Fide Cem, Falsafi Babak. Dead-block prediction and dead-block correlating prefetchers[C]. Proceedings of the 28th annual international symposium on Computer architecture, Teborg, Sweden,2001:144-154.
    [112]Qureshi Moinuddin K., Suleman M. Aater, Patt, Yale N. Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines[C]. Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture, Washington, USA,2007:250-259.
    [113]Abella Jaume, Vera Xavier, O'Boyle, Michael F. P. IATAC:a smart predictor to turn-off L2 cache lines[J].2005,2(1):55-77.
    [114]Jaleel A., Theobald K. B., Steely S. C., et al. High Performance Cache Replacement using re-reference interval prediction[C]. Proceedings of the 37th annual international symposium on computer architecture, New York, USA, 2010:60-71.
    [115]Lin Jiang, Zheng Hongzhong, Zhu Zhichun, et al. Software thermal management of dram memory for multicore systems[C]. Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, MD, USA,2008:337-348.
    [116]Liu Lei, Cui Zehan, Xing Mingjie, et al. A software memory partition approach for eliminating bank-level interference in multicore systems[C]. Proceedings of the 21st international conference on Parallel architectures and compilation techniques, Minnesota, USA,2012:367-376.
    [117]Bhadauria Major, McKee Sally A. An approach to resource-aware co-scheduling for CMPs[C]. Proceedings of the 24th ACM International Conference on Supercomputing, Ibaraki, Japan,2010:189-199.
    [118]Zhuravlev Sergey, Saez Juan Carlos, Blagodurov Sergey, et al. Survey of scheduling techniques for addressing shared resources in multicore processors[J]. ACM Computing Surveys,2012,45(1):4-31.
    [119]Iyer Ravi, Zhao Li, Guo Fei, et al. QoS policies and architecture for cache/memory in CMP platforms[J]. SIGMETRICS Perform. Eval. Rev.,2007, 35(1):25-36.
    [120]Guo Fei, Solihin Yan, Zhao Li, Iyer Ravishankar. Quality of service shared cache management in chip multiprocessor architecture[J]. ACM Trans. Archit. Code Optim.,2010,7(3):14-46.
    [121]Merkel, J.Stoess, F.Bellosa. Resource-conscious scheduling for energy efficiency on multicore processors[C]. Proceedings of the 5th European conference on Computer systems, Paris, France,2010:153-166.
    [122]Ohsawa Taku, Kai Koji, Murakami Kazuaki. Optimizing the DRAM refresh count for merged DRAM/logic LSIs[C]. Proceedings of the 1998 international symposium on Low power electronics and design, California, USA,1998:82-87.
    [123]Patel K., Benini L., Macii Enrico, Poncino Massimo. Energy-Efficient value-based selective refresh for embedded DRAMs[C]. Proceedings of the 15th international conference on Integrated Circuit and System Design:power and Timing Modeling, Optimization and Simulation, Leuven, Belgium, 2005:466-476.
    [124]Moshnyaga Vasily G., Vo Hoa, Reinman Glenn, Potkonjak Miodrag. Handheld system energy reduction by OS-driven refresh[C]. Proceedings of the 16th international conference on Integrated Circuit and System Design:power and Timing Modeling, Optimization and Simulation, Montpellier, France, 2006:24-35.
    [125]He Zhengting, Mok Aloysius. Fast co-simulation of transformative systems with OS support on SMP computer[C]. Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, Stockholm, Sweden,2004:164-169.
    [126]Jog Rajeev, Vitale Philip L., Callister James R. Performance evaluation of a commercial cache-coherent shared memory multiprocessor[J]. SIGMETRICS Perform. Eval. Rev.,1990,18(1):173-182.
    [127]Bhadauria Major, McKee Sally A. An approach to resource-aware co-scheduling for CMPs[C]. Proceedings of the 24th ACM International Conference on Supercomputing, Ibaraki, Japan,2010:189-199.
    [128]Kishore Kumar Pusukuri, Rajiv Gupta, Laxmi N. Bhuyan. ADAPT:A framework for coscheduling multithreaded programs[J]. ACM Transactions on Architecture and Code Optimization,2013,9(4):1-24.
    [129]J.Kleinberg, E.Tardos. Algorithm Design[B]. Addision-Wesley Longman, Boston,2005.
    [130]H.Gabow, R.E.Targan. Faster scaling algorithms for general graph-matching problems[J]. Journal of ACM,1991,38(2):815-853.
    [131]M.Garey, D.Johnson. Computers and Intractability[B]. Feeman, San Francisco, 1979.
    [132]R. Sbiaa, R. Law, S.Y.H.Lua, E. L. Tan, et al. Spin Transfer Torque Switching for Multi-bit Per Cell Magnetic Memory with Perpendicular Anisotropy[J]. Applied Physics Letters,2011,99(9),:506-508.
    [133]X.Dong, C.Xu, Y.Xie, N.Jouppi. NVsim:A circuit-level performance, energy, and area model for emerging non-volatile memory[J]. IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems,2012, 31(2):994-1007.
    [134]P.Shivakumar, N.PJouppi, CACTI 3.0:Antegrated cache timing, power and area model, WRL Technical Report 2001.
    [135]HanBin Yoon, Justin Meza, Rachata Ausavarungnirun, Rachael Harding, Onur Mutlu. Row Buffer Locality Aware Caching Policies for Hybrid Memories[C]. Proceedings of the 30th IEEE International Conference on Computer Design (ICCD), Quebec, Canada,2012:245-252.
    [136]Donghyuk Lee, Yoongu Kim, Vivek Seshadri, et al. Tiered-Latency DRAM:A Low Latency and Low Cost DRAM Architecture[C]. Proceedings of the 19th International Symposium on High-Performance Computer Architecture, Shenzhen, China,2013:245-256.
    [137]N.Binkert, B.Beckmann, B.Black, S.K.Reinhardt, A.Saidi, et al. The gem5 simulator[J]. SIGARCH Computer Architecture News,2011,39(1):1-7.
    [138]TI Instrument Corp. OMAPTM 4 PandaBoard System Reference Manual
    [139]The ARM Cortex-A9 Processors specification, http://arm.com/zh/files/pdf/ARMCortexA-9Processors.pdf.
    [140]Binkert N.L, Dreslinski R.G. The M5 Simulator:Modeling Networked Systems[J]. IEEE Journal of Microarchitecture,2006,8(3):52-60.
    [141]Martin M. M. K., Sorin. Multifacet's general execution-driven multiprocessor simulator(GEMS) toolset. ACM SIGARCH computing,2005,7(1):92-99.
    [142]J.Renau, B.Fraguela, J.Tuck, W.Liu, M.Prvulovic, L.Ceze, S.Sarangi, P.Sack, K.Strauss, P.Montesinos. SESC simulator, http://sesc.sourceforge.net,2005.
    [143]Wang, B.Ganesh, N.Tuaycharoen, et al. DRAMsim:a memory system simulator[J]. SIGARCH Computer Architecture News,2005,33(2):100-107.
    [144]Lee, M.Potkonjak, W.H.Mangione-Smiths. Mediabench:a tool for evaluating and synthesizing multimedia and communications systems[C]. Proceedings of the 30th Annual Symp. Microarchitecture, Washington,1997:330-335.
    [145]N.Tuck, D.M.Tullsen. Initial observations of the simultaneous multithreading Pentium 4 processor[C]. Proceedings of 12th International Conf. on Parallel Architectures and Compilation Techniques, Louisiana,2003:26-37.
    [146]Micron technology Inc.1GB mobile LPDDR. Available from htttp://www.micron.com/products/partdetail?part=MT46H32M32LFCG-5IT.
    [147]TI OMAP 4430. www.ti.com.cn/.
    [148]Meza Justin, Chang Jichuan, Yoon HanBin, et al. Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management[C]. IEEE Comput. Archit. Lett.,11(2),2012:61-64.
    [149]Park Hyunsun, Yoo Sungjoo, Lee Sunggu. Power management of hybrid DRAM/PRAM-based main memory[C]. Proceedings of the 48th Design Automation Conference, California, USA,2011:59-64.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700