基于线程的数据预取技术研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

基于线程的数据预取技术研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Thread-based Data Prefetching Techniques
作者：欧国东
论文级别：博士
学科专业名称：计算机科学与技术
中文关键词：多线程处理器 ; 单线程应用 ; 关键存储指令 ; 基于线程的多路径数据预取 ; 线程评估 ; 错误前瞻 ; 置信度
英文关键词：multi-threaded processor ; single-threaded application ; critical memory instructions ; thread-based multi-path data prefetching ; evaluation ; in-correct speculation ; confidence
学位年度：2011
导师：张民选
学科代码：081202
学位授予单位：国防科学技术大学
论文提交日期：2011-03-01

摘要

多线程处理器已经成为市场主流,但是由于并行开发环境还不成熟,大量历史遗留代码以及采用串行模型开发的新代码不能利用多线程处理器中的多个现场并行执行,反而会因为和其他线程竞争使用共享资源而降低执行速度。在多线程处理器上加速执行历史遗留代码等单线程应用成为处理器体系结构研究的热点。基于线程的数据预取方法利用空闲现场执行数据预取线程,计算存储指令访存地址并发起预取,可以改善系统存储行为、加速单线程应用、提高系统吞吐率。
     基于线程的数据预取技术是多线程环境下传统数据预取技术的继承和发展,也是多线程结构的扩展和增强。
     本文全面研究了数据预取技术,尤其是在当前多线程环境下的数据预取技术。在深入分析当前多线程执行、线程辅助执行研究现状的基础上,展开基于线程的数据预取技术研究。本文的创新性主要集中在以下几个方面:
     1.分析应用程序访存行为,定义问题存储指令和关键存储指令,并提出一种关键存储指令解决方法——基于线程的多路径数据预取技术。
     2.提出一种两阶段数据预取线程评估策略,在数据预取线程构建及执行阶段对数据预取线程的预取效果进行评估,选择更高效的数据预取线程。
     3.系统分析基于线程的多路径数据预取技术中的控制流行为,提出了一种优化的错误前瞻多路径数据预取方法,通过分支指令控制数据预取线程的行为,提高数据预取的准确性、减少无效预取数目、降低cache污染。
     4.提出一种基于置信度的数据预取线程控制机制,利用置信度机制增强分支预测,特别是多分支预测的性能,并用它来控制数据预取线程的提取、孵化和执行。
     本文在两种不同的同时多线程处理器(普通超标量同时多线程,显式并行指令计算同时多线程)以及单芯片多处理器上实现并验证了基于线程的多路径数据预取技术。实验表明,基于线程的多路径数据预取技术在改善系统存储行为、加速单线程应用的同时,还可以有效地提高系统吞吐率。目前,基于线程的多路径数据预取技术已经在国家高技术研究发展计划重点项目(2002AA110020, 2005AA110020)以及国家自然科学基金项目(60376018)中得到应用。
With the development of technology and architecture, multi-threaded processors have gradually become the mainstream of microprocessor. But a lot of legacy code as well as new code coded with sequential model can not take advantage of multiple con-texts; their performance may be downgraded because of competition with other applica-tions. Approcaches to impove single-threaded applications’s performance in multi-threaded processors become the focus of discussion.
     Thread-based data prefetching is an effective way to ease the memory wall under multithreading environment. Data prefetching threads extracted from the main program are executed in parallel with the main program threads, and typically generate the access address much faster since they only execute dependent address computations. Thread-based data prefetching is the development and complementation of traditional data prefetching techniques, and it is also an extension and enhancement of multi-threaded architecture.
     This dissertation studies data prefetching techniques, focusing on thread-based data prefetching technique and related optimization. The main contributions are as follows:
     1. Analyze memory access behavior of applications; define problem memory in-structions and critical memory instructions. Propose a data prefetching tech-nique using helper threads: Thread-based Multi-Path Data Prefetching, TMPDP.
     2. Propose a two-phase evaluation scheme. Evaluate the effects of data prefetch threads in the stage of construction and execution.
     3. Propose an optimized data prefetching technique using incorrect speculation. The influence of control flow in data prefetch threads is also analyzed. We use branch instructions in data prefetch threads to control the execution flow. Such accurate control helps to improve the accuracy of prefetch, to reduce useless prefetch and cache pollution.
     4. Propose a confidence-based control scheme. This scheme use confidence to enhance branch prediction, especially multi-branch prediction and to control data prefetch thread extraction, spawning and execution.
     In this dissertation we implement and validate TMPDP techniques on two different platform including simultaneous multithreading(SMT) processors (one is a superscalar simultaneous multi-threading processor, the other one is an Explicitly Parallel Instruc-tion Computing processor with simultaneous multithreading extension) and single chip multi-processors(CMP). Simulation results show that TMPDP not only accelerates sin-gle-threaded applications, but also improves system throughput. TMPDP has already been applied in projects supported by The National High Technology Research (2002AA110020, 2005AA110020) and Development Program of China and National Natural Science Foundation of China (60376018).

引文

[1] Ungerer T, Robi B, Ilc J. Multithreaded Processors[J]. The Computer Journal. 2002, 45(3): 320.-
    [2] Ungerer T, Robi B, Ilc J. A Survey of Processors with Explicit Multithread-ing[J]. ACM Computing Surveys (CSUR). 2003, 35(1): 29~63.
    [3] Tullsen D M, Eggers S J, Emer J S, et al. Exploiting Choice : Instruction Fetch and Issue On an Implementable Simultaneous Multithreading Processor[C]. Pro-ceedings of the 23rd Annual International Symposium on Computer Architecure. New York: ACM Press, 1996: 191~202.
    [4] Hily S, Seznec A. Standard Memory Hierarchy Does Not Fit Simultaneous Multithreading[J]. Workshop on MultiThreaded Execution, Architecture and Compila-tion, held in conjunction with HPCA-4, Colorado State Univ. Technical Report CS-98-102, January. 1998.
    [5] Roth A, Sohi G S. Speculative Data-Driven Multithreading[C]. International Symposium on High Performance Computer Architecture: IEEE Computer Society, 2001: 37.
    [6] Moshovos A, Pnevmatikatos D N, Baniasadi A. Slice-Processors: An Imple-mentation of Operation-Based Prediction[C]. International Conference on Supercom-puting: ACM, 2001: 321~334.
    [7] Abraham S G, Sugumar R A, Windheiser D, et al. Predictability of Load/Store Instruction Latencies[C]. International Symposium on Microarchitecture: IEEE Computer Society Press, 1993: 139~152.
    [8] Collins J D, Tullsen D M, Wang H, et al. Speculative Precomputation: Long-Range Prefetching of Delinquent Loads[C]. International Symposium on Com-puter Architecture: IEEE Computer Society, 2001: 14.
    [9] Rui H, Zhang, Longbing, et al. Accelerating Sequential Programs On Chip Multiprocessors Via Dynamic Prefetching Thread.[J]. Microprocessors and Microsys-tems. 2007, 31(3): 200~211.
    [10] Collins J D, Tullsen D M, Wang H, et al. Dynamic Speculative Precomputa-tion[C]. International Symposium on Microarchitecture: IEEE Computer Society, 2001: 306.
    [11] Lo J L, Eggers S J, Emer J S, et al. Converting Thread-Level Parallelism to Instruction-Level Parallelism Via Simultaneous Multithreading[J]. ACM Transactions on Computer Systems. 1997, 15(3): 322~354.
    [12] Steffan J G, Mowry T C. The Potential for Using Thread-Level Data Specula-tion to Facilitate Automatic Parallelization[C]. International Symposium on High-Performance Computer Architecture: IEEE, 1998: 2~13.
    [13] Marcuello P, Gonz A, Tubella J. Speculative Multithreaded Processors[C]. International Conference on Supercomputing. New York: ACM press, 1998: 77~84.
    [14] Oplinger J T, Heine D L, Lam M S. In Search of Speculative Thread-Level Parallelism[C]. International Conference on Parallel Architectures and Compilation Techniques: IEEE, 1999: 303~313.
    [15] Steffan J G, Colohan C B, Zhai A, et al. A Scalable Approach to Thread-Level Speculation[J]. ACM SIGARCH Computer Architecture News. 2000, 28(2): 1~12.
    [16] Franklin M. The Multiscalar Architecture[D]. University of Wisconsin–Madison, 1993.
    [17] Sohi G S, Breach S E, Vijaykumar T N. Multiscalar Processors[C]. Interna-tional Symposium on Computer Architecture: ACM, 1995: 414~425.
    [18] Smith J E, Vajapeyam S. Trace Processors: Moving to Fourth-Generation Microarchitectures[J]. Computer. 1997, 30(9): 68~74.
    [19] Rotenberg E. Trace Processors: Exploiting Hierarchy and Speculation[D]. University of Wisconsin–Madison, 1999.
    [20] Dubey P, O Brien K, O Brien K M, et al. Single-Program Speculative Multi-threading (Spsm) Architecture: Compiler-Assisted Fine-Grained Multithreading[M]. IBM TJ Watson Research Center, 1995.
    [21] Tsai J Y, Huang, Jian, et al. The Superthreaded Processor Architecture.[J]. IEEE Transactions on Computers. 1999, 48(9): 881~902.
    [22] Tsai J Y, Jiang, Zhenzhen, et al. Compiler Techniques for the Superthreaded Architectures.[J]. International Journal of Parallel Programming. 1999, 27(1): 1~19.
    [23] Akkary H, A, Michael. A Dynamic Multithreading Processor.[C]. Interna-tional Symposium on Microarchitecture. Dallas, Texas, USA, 1998: 226~236.
    [24] Sohi G S, Roth A. Speculative Multithreaded Processors[J]. IEEE Computer. 2001, 34(4): 66~71.
    [25] Smith J E. Decoupled Access/Execute Computer Architectures[C]. ACM SIGARCH Computer Architecture News. New York, N.Y.: IEEE Computer Society Press, 1982: 112~119.
    [26] Rangan R, Vachharajani, Neil, et al. Decoupled Software Pipelining with the Synchronization Array.[C]. International Symposium on Parallel Architectures and Compilation Techniques. Antibes Juan-les-Pins, France: IEEE Computer Society, 2004: 177~188.
    [27] Purser Z, Sundaramoorthy, Karthik, et al. A Study of Slipstream Proces-sors.[C]. International Symposium on Microarchitecture. Monterey, California, USA, 2000: 269~280.
    [28] Ibrahim K Z, T, Gregory, et al. Slipstream Execution Mode for Cmp-Based Multiprocessors.[C]. International Symposium on High Performance Computer Archi-tecture. Anaheim, California, USA., 2003: 179~190.
    [29] Sundaramoorthy K, Purser, Zachary, et al. Slipstream Processors: Improving Both Performance and Fault Tolerance.[C]. International Conference on Architectural Support for Programming Languages and Operating Systems. Cambridge, MA, USA, 2000: 257~268.
    [30] Uht A K, Sindagi, Vijay, et al. Disjoint Eager Execution: An Optimal Form of Speculative Execution.[C]. International Symposium on Microarchitecture. Ann Arbor, Michigan, USA: ACM/IEEE, 1995: 313~325.
    [31] Uht A K. Disjoint Eager Execution: What It is/What It is Not[J]. ACM SI-GARCH Computer Architecture News. 2002, 30(1): 12~14.
    [32] Dubois M. Fighting the Memory Wall with Assisted Execution[C]. Confer-ence on Computing Frontiers. Ischia, Italy: ACM, 2004: 168~180.
    [33] Dubois M, Song Y H. Assisted Execution[R]. Department of EE-Systems, University of Southern California, 1998: 25~98.
    [34] Hassanein W, Fortes J, Eigenmann R. Towards Guided Data Forwarding Us-ing Intelligent Memory[C]. Workshop on Memory Performance Issues in conjunction with International Symposium on Computer Architecture, 2002: 20~29.
    [35] Hassanein W, G J F, Eigenmann R. Data Forwarding through in-Memory Pre-computation Threads[C]. International Conference on Supercomputing, 2004: 207~216.
    [36] Hassanein W M. Processing-in-Memory Techniques for Hiding Memory Ac-cess Latency[D]. Purdue University, 2004.
    [37] Luk C K. Tolerating Memory Latency through Software-Controlled Pre-Execution in Simultaneous Multithreading Processors[J]. ACM SIGARCH Com-puter Architecture News. 2001, 29(2): 40~51.
    [38] Liao S W, Wang P H, Wang H, et al. Post-Pass Binary Adaptation for Soft-ware-Based Speculative Precomputation[C]. Programming Language Design and Im-plementation. Berlin, Germany, 2002: 117~128.
    [39] Roth A, Sohi G S. A Quantitative Framework for Automated Pre-Execution Thread Selection[C]. International Symposium on Microarchitecture. Istanbul, Turkey: ACM/IEEE, 2002: 430~441.
    [40] Petric V, Roth A. Energy-Effectiveness of Pre-Execution and Energy-Aware P-Thread Selection[J]. ACM SIGARCH Computer Architecture News. 2005, 33(2): 322~333.
    [41] Chappell R S, Stark J, Kim S P, et al. Simultaneous Subordinate Mi-crothreading (Ssmt)[C]. Proceedings of the 26th annual international symposium on Computer architecture: IEEE Computer Society, 1999: 186~195.
    [42] Chappell R S, Tseng, Francis, et al. Difficult-Path Branch Prediction Using Subordinate Microthreads.[C]. International Symposium on Computer Architecture. Anchorage, AK, USA: IEEE Computer Society, 2002: 307~317.
    [43] Chappell R S, Tseng, Francis, et al. Microarchitectural Support for Precom-putation Microthreads.[C]. International Symposium on Microarchitecture. Istanbul, Turkey: ACM/IEEE, 2002: 74~84.
    [44] Roth A, S, Gurindar. Register Integration: A Simple and Efficient Implemen-tation of Squash Reuse.[C]. International Symposium on Microarchitecture. Register integration, 2000: 223~234.
    [45] Vander S P, Lilja D J. Data Prefetch Mechanisms[J]. ACM Computing Sur-veys. 2000, 32(2): 174~199.
    [46] Anacker W, Wang C P. Performance Evaluation of Computing Systems with Memory Hierarchies[J]. IEEE Transactions on Computers. 1967, 16(6): 764~773.
    [47] Smith A J. Sequential Program Prefetching in Memory Hierarchies[J]. Com-puter. 1978, 11(12): 7~21.
    [48] Smith A J. Cache Memories[J]. ACM Computing Surveys (CSUR). 1982, 14(3): 473~530.
    [49] Porterfield A. Software Methods for Improvement of Cache Performance On Supercomputer Applications[D]. Rice University, Dept. of Computer Science, 1989.
    [50] Diefendorff K, Allen M. Organization of the Motorola 88110 Superscalar Risc Microprocessor[J]. IEEE Micro. 1992, 12(2): 40~63.
    [51] Fu B, Saini A, Gelsinger P P. Performance and Microarchitecture of the I486 Processor[J]. Computer Design: VLSI in Computers and Processors. 1989: 182~187.
    [52] Aamodt T, Moshovos A, Chow P. The Predictability of Computations that Produce Unpredictable Outcomes[C]. Workshop on Multithreaded Execution, Archi-tecture and Compilation, 2001: 23~34.
    [53] Park I, Falsafi B, Vijaykumar T N. Implicitly-Multithreaded Processors[J]. ACM SIGARCH Computer Architecture News. 2003, 31(2): 39~51.
    [54] Chen S, Ailamaki, Anastassia, et al. Improving Hash Join Performancethrough Prefetching.[C]. International Conference on Data Engineering: IEEE Com-puter Society, 2004: 116~127.
    [55] Zilles C, Sohi G. Execution-Based Prediction Using Speculative Slices[C]. International Symposium on Computer Architecture: IEEE, 2001: 2~13.
    [56] Dorai G K, Yeung D. Transparent Threads: Resource Sharing in Smt Proces-sors for High Single-Thread Performance[C]. International Symposium on Parallel Ar-chitectures and Compilation Techniques. Charlottesville, VA, USA, 2002: 30~41.
    [57] Gon Alves R A L, Sagula R L, Diverio T A, et al. Process Prefetching for a Simultaneous Multithreaded Architecture[C]. Symposium on Computer Architecture and High Performance Computing. Natal, Brazil, 1999.
    [58] Aamodt T, Marcuello P, Chow P, et al. Prescient Instruction Prefetch[C]. Workshop on Multithreaded Execution, Architecture and Compilation, 2002: 2~10.
    [59]曹宏嘉.面向微处理器设计的动态二进制翻译技术研究[D].长沙:国防科学技术大学, 2005.
    [60]邓鹍.基于多线程结构的编译优化技术[J].计算机工程与科学. 1999, 21(4): 13~16.
    [61]邓鹍.前瞻多线程编译优化技术的研究与实现[D].长沙:国防科学技术大学, 2001.
    [62]杜贵然.多路径Trace处理器[D].长沙:国防科学技术大学, 2001.
    [63]何立强,刘志勇.一种具有Qos特性的同时多线程处理器取指策略[J].计算机研究与发展. 2006, 43(11): 1980~1984.
    [64]何立强,刘志勇.一种有效的同时多线程处理器取指控制机制[J].计算机学报. 2006, 29(4): 535~543.
    [65]黄光奇. Scmp中共享多端口数据Cache结构的研究[D].长沙:国防科学技术大学, 2000.
    [66]欧国东,张民选.一种基于线程的数据预取方法[J].计算机工程与科学. 2008, 30(1): 119~122.
    [67]沈立.动态Vliw体系结构关键技术研究与实现[D].长沙:国防科学技术大学, 2003.
    [68]孙彩霞.同时多线程处理器中的资源分配策略研究[D].长沙:国防科学技术大学, 2006.
    [69]孙彩霞,张民选.基于多个取指优先级的同时多线程处理器取指策略[J].电子学报. 2006, 34(5): 790~795.
    [70]唐遇星.面向动态二进制翻译的动态优化和微处理器体系结构支撑技术研究[D].国防科学技术大学, 2005.
    [71]万江华,陈书明. Mosi:一种基于超长指令字处理器的同时多线程微体系结构[J].计算机学报. 2006, 22(6): 378~383.
    [72]王永文.高性能微处理器体系结构级功耗估算与优化技术研究[D].长沙:国防科学技术大学, 2004.
    [73]肖刚.前瞻性多线程体系结构的研究——原理和实现[D].长沙:国防科学技术大学, 1999.
    [74]肖刚,周兴铭等. Sma:前瞻性多线程体系结构[J].计算机学报. 1999, 22(6): 582~590.
    [75]肖刚,周兴铭等.前瞻性执行超标量处理器的性能分析模型[J].计算机研究与发展. 1999, 36(4): 494~499.
    [76]赵荣彩.多线程低功耗编译优化技术研究[D].北京:中国科学院计算技术研究所, 2002.
    [77]朱霞,高德远,樊晓椏. Dcr置信度评估方案[J].计算机学报. 2004, 27(8): 1121~1128.
    [78] Rui H, Zhang, Fuxin, et al. A Memory Bandwidth Effective Cache Store Miss Policy.[C]. Asia-Pacific Computer Systems Architecture Conference, 2005: 750~760.
    [79] Rui H, Zhang, Longbing, et al. A Hybrid Hardware/Software Generated Pre-fetching Thread Mechanism On Chip Multiprocessors.[C]. Euro-Par, 2006: 506~516.
    [80] Burger D C, Austin T M, Bennett S. Evaluating Future Microprocessors: The Simplescalar Tool Set[M]. University of Wisconsin-Madison, Computer Sciences Dept, 1996.
    [81] Austin T, Larson E, Ernst D. Simplescalar: An Infrastructure for Computer System Modeling[J]. Computer. 2002, 35(2): 59~67.
    [82] Kleinosowski A, Lilja D J. Minnespec: A New Spec Benchmark Workload for Simulation-Based Computer Architecture Research[J]. Computer Architecture Let-ters. 2002, 1(2): 10~13.
    [83] Henning J L. Spec Cpu2000: Measuring Cpu Performance in the New Mil-lennium[J]. Computer. 2000, 33(7): 28~35.
    [84] Madon D, Sanchez E, Monnier S. A Study of a Simultaneous Multithreaded Processor Implementation[C]. Euro-Par, 1999: 716~726.
    [85] Chen T F, Baer J L. Effective Hardware-Based Data Prefetching for High-Performance Processors[J]. IEEE Transactions on Computers. 1995, 44(5): 609~623.
    [86] Joseph D, Grunwald, Dirk. Prefetching Using Markov Predictors.[J]. IEEE Transactions on Computers. 1999, 48(2): 121~133.
    [87] Amir R. Pre-Execution Via Speculative Data-Driven Multithreading[D]. Uni-versity of Wisconsin-Mandison, 2001.
    [88] Brown J A, Wang H, Chrysos G, et al. Speculative Precomputation On Chip Multiprocessors[C]. Workshop on Multi-Threaded Execution, Architecture and Compi-lation, 2002.
    [89] Eggers S J, Emer J S, Levy H M, et al. Simultaneous Multithreading: A Plat-form for Next-Generation Processors[J]. IEEE Micro. 1997, 17(5): 12~19.
    [90] Dorai G K, Yeung D, Choi S. Optimizing Smt Processors for High Sin-gle-Thread Performance.[J]. Instruction-Level Parallelism. 2003, 5: 1~35.
    [91] Zilles C B, S, Gurindar. Master/Slave Speculative Parallelization.[C]. Interna-tional Symposium on Microarchitecture. Istanbul, Turkey: ACM/IEEE, 2002: 85~96.
    [92] Ganusov I, Burtscher M. Future Execution: A Hardware Prefetching Tech-nique for Chip Multiprocessors[J]. International Conference on Parallel Architectures and Compilation Techniques. 2005: 350~360.
    [93] Zhou H. Dual-Core Execution: Building a Highly Scalable Single-Thread In-struction Window[J]. International Conference on Parallel Architectures and Compila-tion Techniques. 2005: 231~242.
    [94] Jung C, F D L, Lee J, et al. Helper Thread Prefetching for Loosely-Coupled Multiprocessor Systems[C]. International Parallel and Distributed Processing Sympo-sium. Rhodes Island, Greece: IEEE Computer Society, 2006.
    [95] Lu J, Das A, Hsu W C, et al. Dynamic Helper Threaded Prefetching On the Sun Ultrasparc(R) Cmp Processor[C]. International Symposium on Microarchitecture:IEEE, 2005: 12.
    [96] Mutlu O, Stark, Jared, et al. Runahead Execution: An Effective Alternative to Large Instruction Windows.[J]. IEEE Micro. 2003, 23(6): 20~25.
    [97] Collins J D. Data Prefetching Via Speculative Precomputation On a Simulta-neous Multithreaded Processor[D]. University of California, San Diego, 2004.
    [98] Jouppi N P. Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers.[C]. International Symposium on Computer Architecture, 1990: 364~373.
    [99] Akkary H, T, Srikanth, et al. Recycling Waste: Exploiting Wrong-Path Exe-cution to Improve Branch Prediction.[C]. International Conference on Supercomputing. San Francisco, CA, USA: ACM, 2003: 12~21.
    [100] Jacobsen E, Rotenberg E, Smith J E. Assigning Confidence to Conditional Branch Predictions[C]. International Symposium on Microarchitecture: IEEE Computer Society, 1996: 142~152.
    [101] Yeh T Y, Marr D T, Patt Y N. Increasing the Instruction Fetch Rate Via Mul-tiple Branch Prediction and a Branch Address Cache[J]. International Conference on Supercomputing. 1993: 67~76.
    [102] Smith J E. A Study of Branch Prediction Strategies[C]. International Confer-ence on Computer Architecture, 1998: 202~215.
    [103] Yeh T Y, Patt Y N. Alternative Implementations of Two-Level Adaptive Branch Prediction[C]. International Symposium on Computer Architecture, 1992: 124~134.
    [104] Yeh T Y, Patt Y N. Two-Level Adaptive Training Branch Prediction[C]. In-ternational Symposium on Microarchitecture: ACM, 1991: 51~61.
    [105] Ozer E, Conte T M, Sharma S. Weld: A Multithreading Technique Towards Latency-Tolerant Vliw Processors[C]. International Conference on High Performance Computing, 2001: 192~203.
    [106] Carlisle M C, Rogers, Anne, et al. Early Experiences with Olden.[C]. Interna-tional Workshop on Languages and Compilers for Parallel Computing. Portland, Ore-gon, USA: Springer, 1993: 1~20.
    [107] Carlisle M C. Olden: Parallelizing Programs with Dynamic Data Structures On Distributed-Memory Machines[D]. Princeton University Department of Computer Science, 1996.
    [108] Carlisle M C, Rogers A. Software Caching and Computation Migration in Olden[C]. ACM SIGPLAN Symposium on Principles and Practice of Parallel Pro-gramming: ACM, 1995: 29~38.
    [109] Schlansker M S, Rau B R. Epic: Explicitly Parallel Instruction Computing[J]. Computer. 2000, 33(2): 37~45.
    [110] Hwu W W, Sias J W, Merten M C, et al. Itanium Performance Insights[C]. Microprocessor Forum, San Jose, CA, 2001.
    [111] Sias J W, Merten M C, Nystrom E M, et al. Itanium Performance Insights From the Impact Compiler[C]. Hot Chips, 2001.
    [112] Mcnairy C, Soltis D. Itanium 2 Processor Microarchitecture[J]. Micro, IEEE. 2003, 23(2): 44~55.
    [113] Sharangpani H, Arora H. Itanium Processor Microarchitecture[J]. Micro, IEEE. 2000, 20(5): 24~43.
    [114] Tullsen D M. Simulation and Modeling of a Simultaneous Multithreading Processor[C]. Computer Measurement Group Conference: Springer, 1996.
    [115] Darsch A, Seznec A. Iato: A Flexible Epic Simulation Environment[J]. Sym-posium on Computer Architecture and High Performance Computing. 2004: 58~65.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700