体系结构级Cache功耗优化技术研究

英文题名：Research on Power Optimization of Cache Architecture Design
作者：项晓燕
论文级别：博士
学科专业名称：电路与系统
中文关键词：高速缓存 ; 体系结构 ; 功耗优化 ; 指令高速缓存 ; 数据高速缓存 ; 可重构高速缓存
英文关键词：Cache ; Architecture Level ; Power Optimization ; Instruction Cache ; Data Cache ; Configurable Cache
学位年度：2013
导师：严晓浪
学科代码：080902
学位授予单位：浙江大学
论文提交日期：2013-04-01
答辩委员会主席：吴为麟

摘要

随着集成电路制造工艺的进步和微处理器性能的提高,微处理器功耗问题日益严重,成为制约微处理器发展的主要瓶颈。片上高速缓存(Cache)功耗作为微处理器功耗的重要组成部分,降低Cache功耗成为控制微处理器功耗的主要目标。由于底层功耗优化手段受工艺和材料物理特性的制约已经很难满足Cache的功耗约束,因此需要从更高层次对Cache功耗进行优化。本文从Cache功耗的组成、访问特性、功耗性能平衡等多个角度出发,提出了多项体系结构级Cache功耗优化方法。主要研究工作和创新点包括：
     低功耗指令Cache研究。针对指令Cache行间访问偏移范围存在明显局部性特征,提出了一种将Cache当前访问行及其若干紧邻行链接访问的低功耗指令缓存访问方法。该方法能够在发生相对跳转时依托于相邻行之间的访问链接信息,精确获得跳转目标行的路访问信息,从而减少对Cache标志和数据存储器的访问,达到降低指令Cache动态功耗的目的。在Cache行发生替换时,仅需检测并清除相邻缓存行与被替换行的链接信息,以很小的硬件代价实现链接信息的正确性。
     低功耗数据Cache研究。针对数据Cache与存储加载队列并行访问的功耗问题和串行访问的性能问题,提出了一种基于存储加载队列预测访问过滤无效数据Cache访问的低功耗方法。利用内存相关性的可预测特征,通过记录加载指令与存储加载队列中存在内存相关性的指令集合,预测后续仅需访问存储加载队列的加载指令,直接从存储加载队列前馈数据通路获取加载结果,关闭数据Cache的访问。
     Cache可重构算法研究。针对可重构Cache中重构搜索的开销问题,提出了一种基于函数转移开启Cache重新配置的可重构预测算法。利用函数转移获取新程序段的特性,以函数为单位动态监测Cache缺失率变化,通过函数历史最优Cache配置参数预测后续函数的Cache重构配置信息,减少重构过程对Cache设计空间的搜索；进一步,通过区分重构前后的缓存行,使重构后Cache能够继续使用重构前的缓存数据,降低了Cache初始化的延时和功耗。
     Cache无效访问研究。针对分支行为预测错误导致指令Cache的无效访问,提出了一种基于零延时分支预测的指令Cache低功耗方法,利用分支预测的行为信息参与后续分支行为预测,消除深流水、超标量处理器中由于分支代价高导致分支历史重名问题,提高分支行为的预测准确率,减少指令Cache无效访问功耗。
     本文提出的多项体系结构级Cache功耗优化方法能够在不影响性能的前提下,有效降低Cache功耗,改善微处理器的性能功耗比。
With the development of the IC manufacture technology and the functionality progress arising from microprocessors, the power issue is more seriously and becomes the main obstacle for improving the performance of microprocessors. Obvious power consumption will not only increase the manufacture cost, but also influence the microprocessor's stability and credibility. On-chip cache consumes a significant amount of microprocessor's energy. So designing an energy-efficient on-chip cache memory is the main object as feature size shrinks and capability and associativity of cache increase. Since circuit-level and logic-level low power technologies are highly influenced by the progress of process technology and material physical characteristic, they cannot meet the requirement of on-chip cache energy constrain. Architectural effort to reduce on-chip cache power consumption is considered.In this thesis, we proposed multiple power optimizations for on-chip cache architecture design based on the component of cache power consumption, the access characteristics of different cache and the balance of power and performance. The main contributions are as follows:
     1. Low power instruction cache design. Set-associative instruction caches consume a large potion of power in modern microprocessors. This paper analyzed the behavior of cache accessing and discovered that the most accesses were sequential accesses and short distance branches whose targets were to the adjacent cache line. So the paper proposed a new low power instruction cache architecture that recorded the link information of the current cache line and its adjacent cache lines. When a cache access occured, it could reuse the adjacent cache line links to get the way information of the target line. Then it could directly access one way of data array and avoid tag lookups to reduce the power consumption. When a cache line was evicted, only its adjacent cache line links should be checked and invalidated to keep the correctness of the links.
     2. Low power data cache design. Data cache access with load-store-queue in parallel consumes a large amount of energy and in serial increases load-to-use latency. A low power data cache based on load-store-queue predicting access was proposed in this paper to filter out unnecessary access to data cache. Memory dependency set was defined to recode each load's dependent loads and stores residing in load-store-queue. When a load instruction was fetched, its memory dependency set was checked. The load which only need access load-store-queue was decided and its result was gotten from the load-store-queue forwarding data-path, excluding to data cache access. As a result, data cache based on load-store-queue predicting access reduced power consumption without performance loss.
     3. Cache configuration algorithm. Configurable cache suffers the problem that the tuning interval does not closely match the phase changes of an application and high cost from configuration overhead. A subroutine calling based configuration prediction algorithm was proposed in this paper to improve the tuning interval and reduce the overhead. Since cache requirements might vary greatly across different subroutines, miss rate was checked when a subroutine was called. If miss rate overpast the threshold value, cache began to tune with the history optimization cache parameter of the subroutine. Furthermore, a cache line reuse mechanism between cache tuning was proposed by identifying the cache lines which belonged to pre-configuration or post-configuration to reduce the cache initial performance loss and power consumption.
     4. Reducing unnecessary access of instruction cache. According to the fact that branch prediction miss results in unnecessary access to instruction cache, a low power instruction cache based on a zero-delay branch prediction mechanism was proposed. Branch prediction behavior was used to predict the subsequent branches, which eliminated the branch history alias in deep pipeline and superscalar microprocessors. The accuracy of branch prediction was improved and cache unnecessary access power was reduced.
     Techniques proposed in this thesis can achieve aggressive power saving without performance reduction. Energy efficient of the microprocessor is also improved.

引文

[1]FAGGIN F, HOFF JR M E, MAZOR S, et al. The History of the 4004 [J]. Micro, IEEE,1996,16(6):10-20.
    [2]SEGARS S. Low power design techniques for microprocessors[C]//IEEE International Solid-State Circuits Conference Tutorial,2001:
    [3]LANIER T:ARM, Tech. Rep,2011.
    [4]LAHIRI K, RAGHUNATHAN A, DEY S, et al. Battery-driven system design: a new frontier in low power design[C]//Design Automation Conference,2002 Proceedings of ASP-DAC 2002 7th Asia and South Pacific and the 15th International Conference on VLSI Design Proceedings,2002:261-7
    [5]RABAEY J M, CHANDRAKASAN A P, NIKOLIC B. Digital integrated circuits [M]. Prentice hall Englewood Cliffs,2002.
    [6]UNSAL O S, ASHOK R, KOREN I, et al. Cool-Cache:A compiler-enabled energy efficient data caching framework for embedded/multimedia processors [J]. ACM Trans Embed Comput Syst,2003,2(3):373-92.
    [7]朱正涌.半导体集成电路[M].清华大学出版社有限公司,2001.
    [8]RABAEY J M, PEDRAM M. Low power design methodologies [M]. Kluwer Academic Publishers,1996.
    [9]KAO J, NARENDRA S, CHANDRAKASAN A. Subthreshold leakage modeling and reduction techniques[C]//Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design,2002:141-8
    [10]Managing Power in Ultra Deep Submicron ASIC/IC Design [M]. Synopsys Technical Report,2002.
    [11]HENNESSY J L, PATTERSON D A. Computer architecture:a quantitative approach [M]. Morgan Kaufmann,2011.
    [12]ROY K, MUKHOPADHYAY S, MAHMOODI-MEIMAND H. Leakage current in deep-submicron CMOS circuits [J]. Journal of Circuits, Systems, and Computers,2002,11(06):575-600.
    [13]CHAU R S. Intel's breakthrough in high-K gate dielectric drives Moore's law well into the future [J]. Technology,2004,1.
    [14]WEI L, ROY K, DE V K. Low voltage low power CMOS design techniques for deep submicron ICs[C]//VLSI Design,2000 Thirteenth International Conference on,2000:24-9
    [15]NARENDRA S, TSCHANZ J, KESHAVARZ1 A, et al. Comparative performance, leakage power and switching power of circuits in 150 nm PD-SOI and bulk technologies including impact of SOI history effect[C]// VLSI Circuits,2001 Digest of Technical Papers 2001 Symposium on,2001: 217-8
    [16]KIM K, JOSHI R V, CHUANG C-T. Strained-Si devices and circuits for low-power applications[C]//Proceedings of the 2003 international symposium on Low power electronics and design,2003:180-3
    [17]BAI R, KIM N-S, KGIL T H, et al. Power-performance trade-offs in nanometer-scale multi-level caches considering total leakage[C]//Proceedings of the conference on Design, Automation and Test in Europe-Volume 1,2005: 650-1
    [18]NⅡ K, MAKINO H, TUJIHASHI Y, et al. A low power SRAM using auto-backgate-controlled MT-CMOS[C]//Low Power Electronics and Design, 1998 Proceedings 1998 International Symposium on,1998:293-8
    [19]POWELL M, YANG S-H, FALSAFI B, et al. Gated-V dd:a circuit technique to reduce leakage in deep-submicron cache memories[C]//Proceedings of the 2000 international symposium on Low power electronics and design,2000: 90-5
    [20]KIM N S, FLAUTNER K, BLAAUW D, et al. Circuit and microarchitectural techniques for reducing cache leakage power [J]. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on,2004,12(2):167-84.
    [21]BARDIZBANYAN A, SJ LANDER M, WHALLEY D, et al. Towards a performance-and energy-efficient data filter cache[C]//Proceedings of the 10th Workshop on Optimizations for DSP and Embedded Systems,2013:21-8
    [22]CUIPING X, GE Z, SHOUQING H. Fast Way-Prediction Instruction Cache for Energy Efficiency and High Performance[C]//Networking, Architecture, and Storage,2009 NAS 2009 IEEE International Conference on,2009:235-8
    [23]ADEGBIJA T, GORDON-ROSS A, MUNIR A. Dynamic phase-based tuning for embedded systems using phase distance mapping[C]//Computer Design (ICCD),2012 IEEE 30th International Conference on,2012:284-90
    [24]BURD T, PERING T, STRATAKOS A, et al. A dynamic voltage scaled microprocessor system[C]//Solid-State Circuits Conference,2000 Digest of Technical Papers ISSCC 2000 IEEE International,2000:294-5,466
    [25]CHUNG-HSING H, KREMER U, HSIAO M. Compiler-directed dynamic voltage/frequency scheduling for energy reduction in microprocessors[C]// Low Power Electronics and Design, International Symposium on,2001,2001: 275-8
    [26]OLIVER J, RAO R, SULTANA P, et al. Synchroscalar:a multiple clock domain, power-aware, tile-based embedded processor[C]//Computer Architecture,2004 Proceedings 31st Annual International Symposium on, 2004:150-61
    [27]MAGKLIS G, SCOTT M L, SEMERARO G, et al. Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor[C]//Computer Architecture,2003 Proceedings 30th Annual International Symposium on,2003:14-25
    [28]SEMERARO G P. Multiple clock domain microarchitecture design and analysis [D]; The University of Rochester,2003.
    [29]KIN J, GUPTA M, MANGIONE-SMITH W H. The filter cache:an energy efficient memory structure[C]//Microarchitecture,1997 Proceedings, Thirtieth Annual IEEE/ACM International Symposium on,1997:184-93
    [30]CHIA-LIN Y, CHIEN-HAO L. HotSpot cache:joint temporal and spatial locality exploitation for I-cache energy reduction[C]//Low Power Electronics and Design,2004 ISLPED'04 Proceedings of the 2004 International Symposium on,2004:114-9
    [31]TANG W, GUPTA R, NICOLAU A. Design of a predictive filter cache for energy savings in high performance processor architectures[C]//Computer Design,2001 ICCD 2001 Proceedings 2001 International Conference on,2001: 68-73
    [32]HUANG M, RENAU J, YOO S-M, et al. L1 data cache decomposition for energy efficiency[C]//Proceedings of the 2001 international symposium on Low power electronics and design,2001:10-5
    [33]HASEGAWA A, KAWASAKI I, YAM ADA K, et al. SH3:high code density, low power [J]. Micro, IEEE,1995,15(6):11-9.
    [34]POWELL M D, AGARWAL A, VIJAYKUMAR T N, et al. Reducing set-associative cache energy via way-prediction and selective direct-mapping[C]//Microarchitecture,2001 MICRO-34 Proceedings 34th ACM/IEEE International Symposium on,2001:54-65
    [35]CHUANJUN Z, VAHID F, JUN Y, et al. A Way-Halting Cache for Low-Energy High-Performance Systems[C]//Low Power Electronics and Design,2004 ISLPED'04 Proceedings of the 2004 International Symposium on,2004:126-31
    [36]KAXIRAS S, ZHIGANG H, MARTONOSI M. Cache decay:exploiting generational behavior to reduce cache leakage power[C]//Computer Architecture,2001 Proceedings 28th Annual International Symposium on, 2001:240-51
    [37]MA A, ZHANG M. Asanovic', K.,2001, "Way Memorization to Reduce Fetch Energy in Instruction Caches," [C]//Proceedings of the 28th ISCA Workshop on Complexity Effective Design:1-9
    [38]龚帅帅,吴晓波,孟建熠,et al.基于历史链接关系的指令高速缓存低功耗方法[J].浙江大学学报(工学版),2011,3(013.
    [39]ZHANG Y, YANG J, GUPTA R. Frequent value locality and value-centric data cache design[C]//ACM SIGOPS Operating Systems Review,2000: 150-9
    [40]CHANG Y-J, LAI F. Dynamic zero-sensitivity scheme for low-power cache memories [J]. Micro, IEEE,2005,25(4):20-32.
    [41]YANG J, GUPTA R. Frequent value locality and its applications [J]. ACM Transactions on Embedded Computing Systems (TECS),2002,1(1):79-105.
    [42]YANG J, GUPTA R. Energy efficient frequent value data cache design[C]// Microarchitecture,2002(MICRO-35) Proceedings 35th Annual IEEE/ACM International Symposium on,2002:197-207
    [43]CHUANJUN Z, JUN Y, VAHID F. Low static-power frequent-value data caches[C]//Design, Automation and Test in Europe Conference and Exhibition,2004 Proceedings,2004:214-9 Vol.1
    [44]DROPSHO S, BUYUKTOSUNOGLU A, BALASUBRAMONIAN R, et al. Integrating adaptive on-chip storage structures for reduced dynamic power[C]//Parallel Architectures and Compilation Techniques,2002 Proceedings 2002 International Conference on,2002:141-52
    [45]BALASUBRAMONIAN R, ALBONESI D, BUYUKTOSUNOGLU A, et al. Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures[C]//Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, Monterey, California, USA,2000:245-57.
    [46]赵学梅,叶以正,李晓明,et al.种低功耗高性能的滑动Cache方案[J].计算机研究与发展,2004,41(11)：
    [47]MALIK A, MOYER B, CERMAK D. A low power unified cache architecture providing power and performance flexibility[C]//Low Power Electronics and Design,2000 ISLPED'00 Proceedings of the 2000 International Symposium on,2000:241-3
    [48]ALBONESI D H. Selective cache ways:On-demand cache resource allocation[C]//Microarchitecture,1999 MICRO-32 Proceedings 32nd Annual International Symposium on,1999:248-59
    [49]ZHANG C, VAHID F, NAJJAR W. A highly configurable cache for low energy embedded systems [J]. ACM Transactions on Embedded Computing Systems (TECS),2005,4(2):363-87.
    [50]GI-HO P, KIL-WHAN L, TACK-DON H, et al. Cooperative cache system:A low power cache system for embedded processors [J]. IEICE transactions on electronics,2007,90(4):708-17.
    [51]ZHANG C, VAHID F. Cache configuration exploration on prototyping platforms[C]//Rapid Systems Prototyping,2003 Proceedings 14th IEEE International Workshop on,2003:164-70
    [52]YANG S, POWELL M D, FALSAFI B, et al. An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches[C]//High-Performance Computer Architecture, 2001 HPCA The Seventh International Symposium on,2001:147-57
    [53]ZHOU H, TOBUREN M C, ROTENBERG E, et al. Adaptive mode control:A static-power-efficient cache design [J]. ACM Transactions on Embedded Computing Systems (TECS),2003,2(3):347-72.
    [54]BURGER D, AUSTIN T M. The SimpleScalar tool set, version 2.0 [J]. SIGARCH Comput Archit News,1997,25(3):13-25.
    [55]HICKS P, WALNOCK M, OWENS R M. Analysis of power consumption in memory hierarchies[C]//Low Power Electronics and Design,1997 Proceedings,1997 International Symposium on,1997:239-42
    [56]SHIVAKUMAR P, JOUPPI N P:Technical Report 2001/2, Compaq Computer Corporation,2001. [57] ULMANN B. Instruction looping, an extension to conditional execution
    [J]. ACM SIGARCH Computer Architecture News,1998,26(1):3-4. [58] SU C-L, DESPAIN A M. Cache design trade-offs for power and
    performance optimization:a case study[C]// Proceedings of the 1995 international
    symposium on Low power design, Dana Point, California, USA,1995:63-8.
    [59]HYUN-BUM C, SEONG-TEA J, JU-HEE C, et al. Low Power Instruction Cache with Word Selective Line Buffer[C]//Computational Science and Engineering (CSE),2012 IEEE 15th International Conference on,2012: 215-22
    [60]VIVEKANANDARAJAH K, SRIKANTHAN T, BHATTACHARYYA S. Dynamic filter cache for low power instruction memory hierarchy[C]//Digital System Design,2004 DSD 2004 Euromicro Symposium on,2004:607-10
    [61]TANG W, GUPTA R, NICOLAU A. Power savings in embedded processors through decode filter cache[C]//Design, Automation and Test in Europe Conference and Exhibition,2002 Proceedings,2002:443-8
    [62]LEE L H, MOYER B, ARENDS J. Instruction fetch energy reduction using loop caches for embedded applications with small tight loops[C]//Low Power Electronics and Design,1999 Proceedings 1999 Internationa] Symposium on, 1999:267-9
    [63]ALI K, ABOELAZE M, DATTA S. Modified Hotspot Cache Architecture:A Low Energy Fast Cache for Embedded Processors[C]//Embedded Computer Systems:Architectures, Modeling and Simulation,2006 1C-SAMOS 2006 International Conference on,2006:35-42
    [64]SHI-YOU C, JUINN-DAR H. Low-Power Instruction Cache Architecture Using Pre-Tag Checking[C]//VLSI Design, Automation and Test,2007 VLSI-DAT 2007 International Symposium on,2007:1-4
    [65]INOUE K, ISHIHARA T, MURAKAMI K. Way-predicting set-associative cache for high performance and low energy consumption[C]//Low Power Electronics and Design,1999 Proceedings 1999 International Symposium on, 1999:273-5
    [66]张宇弘,王界兵,严晓浪,et al.标志预访问和组选择历史相结合的低功耗指令cache [J].电子学报,2004,32(8)：1286-9.
    [67]ZHICHUN Z, XIAODONG Z. Access-mode predictions for low-power cache design [J]. Micro, IEEE,2002,22(2):58-71.
    [68]HSIN-CHUAN C, JEN-SHIUN C. Low-power way-predicting cache using valid-bit pre-decision for parallel architectures[C]//Advanced Information Networking and Applications,2005 AINA 2005 19th International Conference on,2005:203-6 vol.2
    [69]GHOSH M, OZER E, FORD S, et al. Way guard:a segmented counting bloom filter approach to reducing energy for set-associative caches[C]// Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design,2009:165-70
    [70]JUAN T, LANG T, NAVARRO J J. The difference-bit cache[C]//ACM SIGARCH Computer Architecture News,1996:114-20
    [71]KERAMIDAS G, XEKALAKIS P, KAXIRAS S. Applying decay to reduce dynamic power in set-associative caches [M]. High Performance Embedded Architectures and Compilers. Springer.2007:38-53.
    [72]FLAUTNER K, KIM N S, MARTIN S, et al. Drowsy caches:simple techniques for reducing leakage power[C]//Computer Architecture,2002 Proceedings 29th Annual International Symposium on,2002:148-57
    [73]MULLER M. Power efficiency & low cost:The ARM6 family [J]. Hot Chips IV,1992,
    [74]P ANWAR R, RENNELS D. Reducing the frequency of tag compares for low power Ⅰ-cache design[C]//Proceedings of the 1995 international symposium on Low power design,1995:57-62
    [75]J M. C-SKY Microsystems.32-bit high performance and low power embedded processor [M]. http://www.c-sky.com.2012.
    [76]ABELLA J, GONZ LEZ A. Power efficient data cache designs[C]//Computer Design,2003 Proceedings 21st International Conference on,2003:8-13
    [77]JIONGYAO Y, WATANABE T. An adaptive width data cache for low power design[C]//SoC Design Conference (ISOCC),2009 International,2009: 488-91
    [78]马志强,季振洲,胡铭曾.基于超窄数据的低功耗数据Cache方案[J].计算机研究与发展,2007,44(5)：775-81.
    [79]MAIYURAN S J, MOULTON L. Low power cache architecture [M]. Google Patents.2005.
    [80]JIONGYAO Y, WATANABE T. A Variable Bitline Data Cache for low power design[C]//Microelectronics and Electronics (PrimeAsia),2010 Asia Pacific Conference on Postgraduate Research in,2010:174-7
    [81]HEINRICH J. MIPS R10000 Microprocessor User's Manual, V1.1, MIPS Technologies [M]. Inc.1995.
    [82]VEIDENBAUM A, NICOLAESCU D. Low energy, highly-associative cache design for embedded processors[C]//Computer Design:VLSI in Computers and Processors,2004 ICCD 2004 Proceedings IEEE International Conference on,2004:332-5
    [83]NICOLAESCU D, SALAMAT B, VEIDENBAUM A, et al. Fast speculative address generation and way caching for reducing L1 data cache energy[C]// Computer Design,2006 ICCD 2006 International Conference on,2007:101-7
    [84]NICOLAESCU D, VEIDENBAUM A, NICOLAU A. Reducing data cache energy consumption via cached load/store queue[C]//Proceedings of the 2003 international symposium on Low power electronics and design,2003:252-7
    [85]CHOWDHURY M F, CARMEAN D M. Method, apparatus, and system for maintaining processor ordering by checking load addresses of unretired load instructions against snooping store addresses [M]. Google Patents.2002.
    [86]FEISTE K A, RONCHETTI B J, SHIPPY D J. System for store forwarding assigning load and store instructions to groups and reorder queues to keep track of program order [M]. Google Patents.2002.
    [87]HUGHES W A, MEYER D R. Store to load forwarding using a dependency link file [M]. Google Patents.2001.
    [88]FRANKLIN M, SOHI G S. ARB:A hardware mechanism for dynamic reordering of memory references [J]. Computers, IEEE Transactions on,1996, 45(5):552-71.
    [89]MOSHOVOS A, BREACH S E, VIJAYKUMAR T N, et al. Dynamic speculation and synchronization of data dependences [J]. ACM SIGARCH Computer Architecture News,1997,25(2):181-93.
    [90]HESSON J H, LEBLANC J, CIAVAGLIA S J. Apparatus to dynamically control the out-of-order execution of load-store instructions in a processor capable of dispatching, issuing and executing multiple instructions in a single processor cycle [M]. Google Patents.1997.
    [91]CHRYSOS G Z, EMER J S. Memory dependence prediction using store sets[C]//Proceedings of the 25th annual international symposium on Computer architecture, Barcelona, Spain,1998:142-53.
    [92]FANG C, CARR S, ONDER S, et al. Feedback-directed memory disambiguation through store distance analysis[C]//Proceedings of the 20th annual international conference on Supercomputing,2006:278-87
    [93]BROOKS D, TIWARI V, MARTONOSI M. Wattch:a framework for architectural-level power analysis and optimizations [J]. ACM SIGARCH Computer Architecture News,2000,28(2):83-94.
    [94]SALJOOGHI V, BARDIZBANYAN A, SJALANDER M, et al. Configurable RTL model for level-1 caches[C]//NORCHIP,2012,2012:1-4
    [95]PAUL M, PETROV P. Dynamically Adaptive I-Cache Partitioning for Energy-Efficient Embedded Multitasking [J]. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on,2011,19(11):2067-80.
    [96]HAJIMIRI H, MISHRA P. Intra-task dynamic cache reconfiguration[C]// VLSI Design (VLSID),2012 25th International Conference on,2012:430-5
    [97]GONZALEZ R, HOROWITZ M. Energy dissipation in general purpose microprocessors [J]. Solid-State Circuits, IEEE Journal of,1996,31(9): 1277-84.
    [98]SHERWOOD T, PERELMAN E, HAMERLY G, et al. Discovering and exploiting program phases [J]. Micro, IEEE,2003,23(6):84-93.
    [99]BENGUEDDACH A, NIAR S, BELDJILALI B. Online First Fit Algorithm for modeling the problem of configurable cache architecture[C]// Microelectronics (ICM),2011 International Conference on,2011:1-6
    [100]郝玉艳,彭蔓蔓.混合Cache的低功耗设计方案[J].计算机工程与应用, 2009,45(20)：68-70.
    [101]KIM H, SOMAN1 A K, TYAGI A. A reconfigurablc multifunction computing cache architecture [J]. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 2001, 9(4): 509-23.
    [102]RANGANATHAN P, ADVE S, JOUPPI N P. Reconfigurable caches and their application to media processing[C]// Computer Architecture, 2000 Proceedings of the 27th International Symposium on, 2000: 2 14-24
    [103]PEIR J-K, LEE Y, HSU W W. Capturing dynamic memory reference behavior with adaptive cache topology[C]//7 ACM SIGPLAN Notices, 1998: 240-50
    [104]NACUL A C, GIVARG1S T. Dynamic voltage and cache reconfiguration for low power[C]// Proceedings of the conference on Design, automation and test in Europe-Volume 2, 2004: 21376
    [105]GONZ LEZ A, ALIAGAS C, VALERO M. A data cache with multiple caching strategies tuned to different types of locality[C]// Proceedings of the 9th international conference on Supercomputing. 1995: 338-47
    [106]VIANA P, GORDON-ROSS A, KEOGH F, et al. Configurable cache subsetting for fast cache tuning[C]// Proceedings of the 43rd annual Design Automation Conference, 2006: 695-700
    [107]GORDON-ROSS A, VAHID F. DUTT N. Automatic tuning of two-level caches to embedded applications[C]7 Proceedings of the conference on Design, automation and test in Europe-Volume 1. 2004: 10208
    [108]K1CHAK V, BORTNYK S, PUNCHF.NKO N. High-effcient method of determination of a dynamic characteristic of the analog-to-digital converter[C]// Modern Problems of Radio Engineering, Telecommunications and Computer Science (TCSET), 2010 International Conference on, 2010: 65-
    [109]彭蔓蔓,李仁发,彭方,et al.基于程序段的可重构cache与处理器低能耗算法[J].计算机应用研究,2008,25(9)：2692-6.
    [110]PENG M. SUN J, WANG Y. A phase-based self-tuning algorithm for rcconfigurablc cache[C]// Digital Society, 2007 ICDS'07 First International Conference on the, 2007: 27-
    [111]GORDON-ROSS A, VAHID F. A self-tuning configurable eache[C]// Proceedings of the 44lh annual Design Automation Conference. 2007: 234-7
    [112]GORDON-ROSS A. VAHID F, DUTT N I). Fast configurable-cache tuning with a unified second-level cache [J]. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 2009. 17( 1): 80-91.
    [1 13]CHI Z, XIANG W. CHUNGUANG 13, et al. Dynamic time tuning for way prediction cache in low power embedded processors[C]// Digital Avionics Systems Conference, 2009 DASC '09 IEEE/A1AA 28th, 2009: 7.E. 1-7.E. 1-8
    [114]JIONGYAO Y, J1ANNAN J, WATANABE T. A behavior-based reconfigurable cache for the low-power embedded processor[C]// ASIC (ASICON), 2011 IEEE 9th International Conference on, 2011: 1-5
    [115]ATUKORALA S. Branch prediction methods used in modern superscalar processors[C]//Information, Communications and Signal Processing,1997 ICICS, Proceedings of 1997 International Conference on,1997:1475-9 vol.3
    [116]BURCH C. PA-8000:a case study of static and dynamic branch prediction[C]//Computer Design:VLSI in Computers and Processors,1997 ICCD'97 Proceedings,1997 IEEE International Conference on,1997:97-105
    [117]MCFARLING S:Technical Report TN-36, Digital Western Research Laboratory,1993.
    [118]YEH T-Y, PATT Y N. Alternative implementations of two-level adaptive branch prediction[C]//ACM SIGARCH Computer Architecture News,1992: 124-34
    [119]YOUNG C, GLOY N, SMITH M D. A comparative analysis of schemes for correlated branch prediction [M]. ACM,1995.
    [120]EDEN A N, MUDGE T. The YAGS branch prediction scheme[C]// Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture,1998:69-77
    [121]KESSLER R E. The alpha 21264 microprocessor [J]. Micro, IEEE,1999, 19(2):24-36.
    [122]JIM NEZ D A, LIN C. Neural methods for dynamic branch prediction [J]. ACM Transactions on Computer Systems (TOCS),2002,20(4):369-97.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700