处理器微体系结构模拟加速策略研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
在设计新的处理器时,体系结构设计人员需要从很大的设计空间寻求最优的设计方案。设计方案的优劣依赖于精确到时钟周期级别的微体系结构模拟器评估。然而,现有模拟器的速度一直是瓶颈,严重制约着体系结构设计师从更多的设计方案中寻求最优的设计,尤其是在多核乃至众核处理器的设计中,模拟器的速度瓶颈显得更加严重。
     现有的模拟加速方法主要对测试程序的部分动态指令段进行详细模拟并预热,虽然加快了模拟速度,但存在两个关键问题:(1)如何既简单又合理地选取部分动态指令段;(2)如何尽可能少地进行“预热”。因此,研究新的处理器微体系结构模拟加速策略十分必要。二阶段系统抽样模拟加速策略、康托尔模拟加速策略对合理选取部分动态指令段提供了有效支持;功能预热加速策略有效减少了“预热”长度。
     性能基准测试程序的程序行为并不是随机的,它呈现出周期性,不同程序的程序行为周期不同。传统的抽样模拟方法要么不考虑周期性,详细模拟许多冗余的指令执行,从而导致模拟速度的下降;要么试图严格考虑周期性,但缺乏捕捉周期性的有效手段。针对上述问题,二阶段系统抽样模拟将详细模拟指令的选择分为两个阶段。第一阶段将测试程序的动态指令流等分成长度较大的指令段,在这些段中以等间隔选取一定数目的指令段作为候选详细模拟指令段。第二阶段将第一阶段选出的每个段等分成长度更小的指令段,再以等间隔选取一定数目的指令段作为最终的详细模拟指令段。与传统的抽样模拟方法相比,该方法既可以减少冗余,又提供了有效考虑程序行为周期性的手段,可以通过设置参数演变成多种其他的抽样策略。TSSS模拟器是二阶段系统抽样模拟策略的原型,也可作为一种程序行为分析的工具。基于TSSS模拟器的实验表明,二阶段系统抽样模拟策略和目前最快的SMARTS策略相比可以获得15%的加速,而模拟精度相当。
     康托尔模拟加速策略以非常规方法解决指令选择问题。该策略将分形理论应用到微体系结构模拟中,使用三分康托尔集的构造过程进行详细模拟指令段的选择。性能基准测试程序的程序行为周期性表现为一种“群聚”性,而康托尔集也具有这一特性,使用康托尔集的构造过程可以近似模拟程序的行为特性。康托尔模拟加速策略利用“群聚”性建立CPI(Cycle Per Instruction)预测模型,利用该模型用户只要确定一个参数,即分割次数,就可以进行模拟,是一种简单明了的模拟策略。CantorSim模拟器是康托尔模拟加速策略的原型。基于CantorSim的实验表明,康托尔模拟加速策略比SMARTS策略的速度提高了23%,CPI平均相对误差为3.2%,仍具有较高精度。
     功能预热加速策略解决“预热”问题。在只有部分指令被详细模拟的技术中,由于详细模拟之前的功能模拟并不模拟微体系结构的状态,所以在开始详细模拟时必然会造成模拟失真的情况。基于等距抽样“功能预热”加速策略将抽样策略中功能预热指令段分成许多等长度的小指令段,然后以等间隔选取一些指令段进行功能预热,其它指令段以快速的功能模拟方式执行,在保证精度的情况下提高了整体模拟速度。该策略给出了其参数优化的经验模型,便于用户使用。实验表明,基于等距抽样的“功能预热”策略可以获得27.8%的模拟速度提升,同时模拟精度并未显著降低,有些指标甚至更为精确。
     随着多核处理器架构的出现,微体系结构模拟的速度面临着更大的挑战。如何全面地、正确地、快速地模拟多核处理器的性能还未得到根本解决。多核模拟加速策略提出了多核处理器的模拟方式,给出了完备多核模拟所需次数的理论模型,并提出了加速方法。
Micro-architects explore a vast design space to identify the best processor designs. To evaluate the design alternatives, Micro-architects rely on the cycle level micro-architecture simulators. However, the simulation speed is a bottleneck in this design exploration because the simulation speed of existing simulators is extremely slow. What's worse, the simulation of multi-core and many-core architecture is becoming more and more complicated and this leads to problems with slow simulation further exacerbated. Hence, it is urgently necessary to study on acceleration strategies of micro-architecture simulation.
     It is an effective way to speed up simulation rate by only detailed simulating partial dynamic instructions of a benchmark's full dynamic instruction stream. However, there are two key challenges: (1) how to select instructions for detailed simulation, which is named as instruction selection problem; (2) how to warm up micro-architecture states before detailed simulation, which is named as warm up problem. These two problems are for the sake of a common goal which tries to accelerate simulation rate as much as possible under a given simulation accuracy.
     The Two-Stage Systematic Sampling simulation strategy is proposed to sovle the instruction selection problem. Traditional sampling methods either treat every dynamic instruction segment as the same causing to simulate redundant instructions in detail or try to capture the periodic behaviors strictly but difficult to achieve desire result. Two-Stage Systematic Sampling simulation approach selects the instructions for detail simulation in two steps: (1) Divide the full dynamic instruction stream of a benchmark into large segments which have the same length and apply systematic sampling to select many segments as the candidate detail simulation instructions. (2) Divide every selected segment in step 1 into smaller segments and apply the same sampling strategy as the step 1 to sample many segments as the final detail simulation instructions. This approach can reduce the redundancy caused by treating every instruction segment as the same and can also be evolved to several other sampling simulation approaches such as systematic sampling and stratified sampling simulation. The simulator using Two-Stage Sampling (TSSS) approach can also be used as a program behavior analysis tool. The experiments show that this approach can obtain 15% acceleration over the famous SMARTS approach while they have the same precision level.
     The Cantor simulation strategy employs an unconventional approach to sovle the instructon selection problem. It applies the fractal theory in micro-architecture simulation. It selects the instructions for detail simulation according to the construction procedure of trisection cantor set. Since the cantor set has the cluster property, it can be used to simulate the same cluster property of program behaviors. This thesis constructs a model for determining the number of divisions based on experiment observations. After this single parameter's determination, users can simulate benchmarks simply. The Cantor simulation approach harvest 23% speedup over SMARTS approach but lose precision slightly. This approach can be used as a complement for other simulation methods.
     The acceleration strategy of functional warm up aims to resolve the warm up problem. Since the micro-architecture states are not recorded before the detail simulation, the simulation results may not reflect the real behavior of hardware. Systematic sampling function-warming is suggested by this thesis and it is based on the fact that the states of large hardware structures often have a long history. The experience model for determining sampling parameters is also provided in this thesis. Experimental results show that the proposed strategy can speed up simulation rate of 27.8% while the accuracy is reduced marginally.
     The multi-core and many-core architectures are becoming a new trend in micro-processor area, yet the simulation of them is more challengeable than that of single-core processors. The strategies studied in this thesis can be applied in simulation of multi-core architectures. Meanwhile, this thesis studied on the possible models of multi-core architecture and the simulation of them.
引文
[1] 李文兵.计算机组成原理.第三版.北京:清华大学出版社,2006.12
    [2] D. A. Patterson, J. L. Hennessy. Computer Organization and Design: The Hardware/Software Interface. Second Edition,
    [3] T. Austin, E. Larson, D. Ernst. SimpleScalar: An Infrastructure for Computer System Modeling. IEEE Computer, 2002, 35 (2): 56-67
    [4] http://www.cs.wisc.edu/~mscalar/simplescalar.html
    [5] C. J. Hughes, V. S. Pai, P. Ranganathan, et al. Rsim: Simulating Shared-Memory Multiprocessors with ILP Processors. IEEE Computer, 2002, 35 (2): 40-49
    [6] E. J. Shriver, K. A. Sakallah. Ravel: Assigned-delay Compiled-code Logic Simulation. In: Proceedings of International Conference on Computer-Aided Design. Santa Clara: IEEE Computer Society Press, 1992. 364-368
    [7] W. Ye, N. Vijaykrishnan, M. Kandemir, et al. The Design and Use of SimplePower: A Cycle-Accurate Energy Estimation Tool. In: Proceedings of Design Automation Conference (DAC). Los Angles, California: IEEE Computer Society Press, 2000. 340-345
    [8] D. Brooks, V. Tiwari, M. Martonosi. Wattch: A Framework for Architecture-Level Power Analysis and Optimizations. Proceedings of International Symposium on Computer Architecture (ISCA). Vancouver BC Canada: IEEE Computer Society Press, 2000.83-94
    [9] K. Skadron, M. R. Stan, K. Sankaranarayanan, et al. Temperature-Aware Microarchitecture: Modeling and Implementation. ACM Transactions on Architecture and Code Optimization, 2004, 1 (1): 94-125
    [10] Y. Li, M. Hempstead, P. Mauro, et al. Power and Thermal Effect of SRAM vs. Latch-Mux Design Styles and Clock Gating Choices. In: Proceedings of International Symposium on Low Power Electronics and Design (ISLPED'05). San Diego, California, USA: IEEE Computer Society Press, 2005.173-178
    [11] R. E. Wunderlich, T. F. Wenisch, B. Fasafi, et al. SAMRTS: Accelerating Microarchitecture Simulation via Rigorous Statistical Sampling. In: Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA'03). San Diego, California, USA: IEEE Computer Society Press, 2003. 84-95
    [12] 张福新,章隆兵. 基于 SimpleScalar 的龙芯 CPU 模拟器 Sim-Godson. 计算机学报, 2007, 30(1): 68-73
    [13] M. Vachharajani, N. Vachharajani, D. I. August. The Liberty Structural Specification Language: A High-Level Modeling Language for Component Reuse. Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). Washington, DC, USA: ACM Press, 2004. 195-206
    [14] M. Vachharajani, N. Vachharajani, D. A. Penry, et al. Microarchitecture Exploration with Liberty. In: Proceedings of the 35th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-35). Istanbul, Turkey: IEEE Computer Society Press, 2002
    [15] http://www.systemc.org/ [16] P. S. Coe, F. W. Howell, R. N. Ibbett, et al. A Hierarchical Computer Architecture Design and Simulation Environment. ACM Transactions on Modeling and Computer Simulation, 1998, 8 (4): 431-446
    [17] Janneck J, Lee W. Disciplining Heterogeneity-The Ptolemy approach. In: Proceedings of ACM SIGPLAN 2001 Workshop on Languages, Compilers and Tools for Embedded Systems (LCTES). Snowbird, Utah, USA: ACM Press, 2001.1-7
    [18] M. Vachharajani, N. Vachharajani , D. A. Penry, et al. The Liberty Simulation Environment: A Deliberate Approach to High-Level System Modeling. ACM Transactions on Computer Systems, 2006, 24 (3):211-249
    [19] J. L. Kihm, D. A. Connors. Statistical Simulation of Multithreaded Architectures. In: Proceedings of the 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS'05). Atlanta,USA: IEEE Computer Society, 2005. 67-74
    [20] P. S. Magnusson, M. Christensson, J. Eskilson, et al. Simics: A Full System Simulation Platform. IEEE Computer, 2002, 35(2): 50-58
    [21] S. R. Goldschmidt, J. L. Hennessy. The Accuracy of Trace-Driven Simulations of Multiprocessors. In: Proceedings of the 1993 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems. Santa Clara, California, USA:ACM Press, 1993. 146-157
    [22] 高翔,张福新.基于龙芯的CPU的多核全系统模拟器 SimOS-Goodson. 软件学报, 2007,18 (4): 1047-1055
    [23] M. M. K. Martin, D. J. Sorin, B. M. BeckMann, et al. Multifacet's General Execution-driven Multiprocessor Simulator (gems) Tool Set. Computer Architecture News, 2005, 33: 1-8
    [24] O. Babaoglu, D. Ferrari. Two-Level Replacement Decisions in Paging Stores. IEEE Transactions on Computers, 1983,32(12): 1151-1159
    [25] L. A. Belady. A Study of Replacement Algorithms for a Virtual Storage Computer. IBM Systems Journal, 1966, 5(2): 78-101
    [26] L. K. John. Performance Evaluation: Techniques, Tools and Benchmarks. http ://lca.ece. utexas.edu/pubs/john perfeval.pdf
    [27] S. Dwarkadas, J. Jump, J. B. Sinclair. Execution-Driven Simulation of Multiprocessors: Address and Timing Analysis. ACM Transactions on Modeling and Computer Simulation, 1994, 4 (4): 314-338
    [28] M. Reilly, J. Edmonson. Performance Simulation of an Alpha Microprocessor. IEEE Computer, 1998: 50-58
    [29] L. K. John, Y. C. Liu. Performance Model for a Prioritized Multiple-Bus Multiprocessor System. IEEE Transactions on Computers, 1996, 45 (5): 580-588
    [30] L. K. John, L. Eeckhout. Performance Evaluation and Benchmarking. Boca Raton London New York: CRC Press, Taylor & Francis Group, 2006. 141-143
    [31] S. Nussbaum, J. E. Smith. Modeling SuperScalar Processors via Statistical Simulation. Proceedings of International Conference on Parallel Architectures and Compilation Techniques. Barcelona, Spain: IEEE Computer Society, 2001. 15-24
    [32] M. Oskin, F. T. Chong, M. Farrens. HLS: Combining Statistical and Symbolic Simulation to Guide Microprocessor Design. Proceedings of the 27~(th) Annual International Symposium on Computer Architecture (ISCA-27). Vancouver, British Columbia, Canada: IEEE Computer Society, 2000. 71-82
    [33] S. S. Mukherjee, S. V. Adve, T. Austin, et al. Performance Simulation Tools. IEEE Computer, 2002, 35 (2): 38-39
    [34] J. L. Hennessy, D. A. Patterson. Computer Architecture: A Quantitative Approach. Third Edition. Beijing: China Machine Press, 2002. 45-48
    [35] P. H. Berger, S. S. Lavenberg. Computer Performance Evaluation Methodology. IEEE Transactions on Computers, 1984, 33 (12): 1195-1220
    [36] E. D. Lazowska, J. Zahorjan, G. S. Graham, et al. Quantitative System Performance: Computer System Analysis Using Queueing Network Models. New Jersey:Prentice-Hall, 1984: 1-17
    [37] J. L. Kihm, D. A. Connors. Statistical Simulation of Multithreaded Architectures. In:Proceedings of the 13th IEEE International Symposium on Modeling Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS'05). Atlanta,USA: IEEE Computer Society, 2005. 67-74
    [38] C. Dubach, T. M. Jones, M. F.P. O'Boyle. Microarchitectural Design Space Exploration Using an Architecture-Centric Approach. In: Proceedings of 40~(th)Annual IEEE/ACM International Symposium on Micro-architecture (MICRO-40).Chicago, Illinois, USA: IEEE Computer Society, 2007. 262-271
    [39] A. Todd, E. Dan. SimpleScalar Tutorial (for release 4.0). Proceedings of the 34~(th)Annual IEEE/ACM International Symposium on Micro-architecture (MICRO-34).Austin, Texas, USA: IEEE Computer Society, 2002
    [40] The Liberty Research Group. Liberty Simulation Environment Core Module Library Reference Manual, 2003
    [41] M. Rosenblum, E. Bugnion, S. Devine, et al. Using the SimOS Machine Simulator to Study Complex Computer Systems. ACM Transactions on Modeling and Computer Simulation, 1997, 7(1): 78-103
    [42] E. Witchel, M. Rosenblum. Embra: Fast and Flexible Machine Simulation. Proceedings of the 1996 SIGMETIRCS Conference on Measurement and Modeling of Computer System. Philadelphia, Pennsylvania, USA: ACM Press, 1996. 68-79
    [43] B. Cmelik, D. Keppel. Shade: A Fast Instruction-Set Simulator for Execution Profiling. In: Proceedings of the 1994 SIGMETIRCS Conference on Measurement and Modeling of Computer Systems. Nashville, Tennessee, USA: ACM Press, 1994.128-137
    [44] M. Rosenblum, S. A. Herrod, E. Witchel, et al. Complete Computer System Simulation: The SimOS Approach. IEEE Parallel and Distributed Technology, 1995, 3 (4); 34-43
    [45] A. Stephen. Using Complete Machine Simulation to Understand Computer System Behavior: Ph.D. thesis. The Library of Stanford University, 1998
    [46] R. Wilson, R. French, C. Wilson, et al. The SUIF Compiler System: A Parallelizing and Optimizing Research Compiler. Technical Report: CSL-TR-94-620, 1994,Stanford University.
    [47] R. Wilson, R. French, C. Wilson, et al. SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers. ACM SIGPLAN Notices, 1994, 29 (12):31-37
    [48] T. Sherwood, E. Perelman, G. Hamerly, et al. Automatically Characterizing Large Scale Program Behavior. Proceedings of the 10~(th) International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS X). San Jose, California, USA: ACM Press, 2002. 45-57
    [49] E. Perelman, G. Hamerly, M. V. Biesbrouck, et al. Using SimPoint for Accurate and Efficient Simulation. ACM SIGMETRICS Performance Evaluation Review, 2003,31(1): 318-319
    [50] J. Lau, J. Sampson, E. Perelamn, et al. The Strong Correlation between Code Signatures and Performance. In: Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). Austin, Texas, USA:IEEE Computer Society, 2005. 236-247
    [51] E. Perelman, G. Hamerly, B. Calder. Picking Statistically Valid and Early Simulation Points. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT'03). New Orleans, Louisiana, USA: IEEE Computer Society, 2003. 244-255
    [52] T. Sherwood, E. Perelman, G. Hamerly, et al. Discovering and Exploiting Program Phases. IEEE Micro, 2003, 23 (6): 84-93
    [53] T. Sherwood, S. Sair, B. Calder. Phase Tracking and Prediction. In: Proceedings of the 30~(th) International Symposium on Computer Architecture (ISCA-30). San Diego,California, USA: IEEE Computer Society, 2003. 336-347
    [54] W. Liu, M. C. Huang. EXPERT: Expedited Simulation Exploiting Program Behavior Repetition. In: Proceedings of the 18~(th) Annual International Conference on Supercomputing (ICS 2004). Saint Malo, France: ACM Press, 2004. 126-135
    [55] J. C. H. McCall. Sampling and Statistics Handbook for Research. Iowa State University Press. Ames, Iowa, 1982
    [56] S. Laha, J. A. Patel, R. K. Iyer. Accurate Low-cost Methods for Performance Evaluation of Cache Memory Systems. IEEE Transactions on Computer, 1988, 37 (1): 1325-1336
    [57] H. S. Stone. High-Performance Computer Architecture. New York, NY: Addison-Wesley, 1990
    [58] T. M. Conte. Systematic Computer Architecture Prototyping. PhD thesis, Department of Electrical and Computer Engineering, University of Illinois, Urbana, Illinois, 1992.
    [59] A. Poursepanj. The PowerPC Performance Modeling Methodology. ACM Communications, 1994, 37 (6): 47-55
    [60] T. M. Conte, M. A. Hirsch, K. N. Menezes. Reducing State Loss for Effective Trace Sampling of Superscalar Processors. Proceedings of International Conference on Computer Design, 1996
    [61] P. D. Bryan, T. M. Conte. Combining Cluster Sampling with Single Pass Methods for Efficient Sampling Regimen Design. Proceedings of IEEE International Conference on Computer Design. Lake Tahoe, California, USA: IEEE Computer Society, 2007
    [62] Z.Govindarajulu.抽样理论与方法(英文影印版).北京:机械工业出版社,2005
    [63] 梁小筠,祝大平.抽样调查的方法和原理.上海:华东师范大学出版社,1994
    [64] R. E. Wunderlich, T. F. Wenisch, B. Falsafi, et al. An Evaluation of Stratified Sampling of Microarchitecture Simulations. Proceedings of the Third Annual Workshop on Duplicating, Deconstructing, and Debunking. 13-18
    [65] T. M. Conte. Systematic Computer Architecture Prototyping: Ph.D. thesis. University of Illinois at Urbana-Champaign, 1992
    [66] A. KleinOsowski, D. J. Lilja. MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research. IEEE Computer Architecture Letters, 2002, 1 (1): 7-11
    [67] J. J. Yi, S. V. Kodakara, R. Sendag, et al. Characterizing and Comparing Prevailing Simulation Techniques. In: Proceedings of the 11~(th) International Symposium on High-Performance Computer Architecture (HPCA-11). San Francisco, USA: IEEE Computer Society, 2005.266-277
    [68] http://www.spec.org/
    [69] 张济忠.分形.北京:清华大学出版社,1995.9-10
    [70] K.Falconer著.分形几何.曾文曲,刘世耀译.沈阳:东北工学院出版社,1991.301
    [71] T. L. Warren, D. Krajcinovic. Random Cantor set models for the elastic-perfectly plastic contact of rough surfaces. Wear, 1996, 196:1-15
    [72] H. E. Hurst. Long-term Storage of Reservoirs. Transactions of the American Society of Civil Engineers, 1951, 116:770-808
    [73] B. Mandelbrot. Statistical Methodology for Non-Periodic Cycles: From the Convariance to R/S Analysis. Annals of Economic and Social Measurement, 1972, 1: 259-290
    [74] Y. Luo, L. K. John, L. Eeckhout. SMA: A Self-Monitored Adaptive Cache Warm-Up Scheme for Microprocessor Simulation. International Journal of Parallel Programming, 2005, 33(5): 561-581
    [75] A. Agarwal, M. Horowitz, J. Hennessy. An Analytical Cache Model. ACM Transactions on Computer Systems, 1989, 7(2): 184-215
    [76] J. W. C. Fu, J. H. Patel. Trace Driven Simulation using Sampled Traces. Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences. Hawaii, USA: IEEE Computer Society, 1994. 211-220
    [77] P. Crowley, J. L. Baer. On the Use of Trace Sampling for Architectural Studies of Desktop Applications. Proceedings of the ACM SIGMETRICS Conference. Atlanta, Georgia, USA: ACM Press, 1999. 208-219
    [78] D. A. Wood, M. D. Hill, R. E. Kessler. A Model for Estimating Trace-sample Miss Ratios. Proceedings of the ACM SIGMETRICS Conference for the Measurement and Modeling of Computer Systems. San Diego, California, USA: ACM Press, 1991. 79-89
    [79] R. E. Kessler, M. D. Hill, D. A. Wood. A Comparison of Trace-sampling Techniques for Multi-megabyte Caches. IEEE Transactions on Computers, 1994, 43(6): 664-675
    [80] A. T. Nguyen, P. Bose, K. Ekanadham, and et al. Accuacy and Speed-up of Parallel Trace-driven Architectural Simulation. Proceedings of the 11~(th) International Parallel Processing Symposium. Geneva, Switzerland: IEEE Computer Society, 1997. 39-44
    [81] J. W. Haskins, K. Skadron. Minimal Subset Evaluation: Rapid Warm-up for Simulated Hardware State. In: Proceedings of the International Conference on Computer Design. Austin, Texas, USA: IEEE Computer Society, 2001. 32-39
    [82] J. W. Haskins, K. Skadron. Memory Reference Reuse Latency: Accelerated Sampled Microarchitecture Simulation. In: Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. Austin, Texas, USA:IEEE Computer Society, 2003. 195-203
    [83] L. Eeckhout, S. Eyerman, B. Callens, and et al. Accurately Warmed-up Trace Samples for the Evaluation of Cache Memories. Proceedings of High Performance Computing Symposium. Orlando, Florida, USA: IEEE Computer Society, 2003.267-274
    [84] L. Eeckhout, Y. Luo, K. De Bosschere, and et al. BLRL: Accurate and Efficient Warmup for Sampled Processor Simulation. The Computer Journal, 2005, 48(4):451-459
    [85] N. Vachharajani, M. Iyer, C. Ashok, and et al. Chip Multi-Processor Scalability for Single-Threaded Applications. ACM SIGARCH Computer Architecture News, 2005,33 (4); 44-53
    [86] J. J. Yi, D. J. Lilja, D. M. Hawkins. Improving Computer Architecture Simulation Methodology by Adding Statistical Rigor. IEEE Transaction on Computers, 2005,54(11): 1360-1373
    [87] http://www.spec.org/osg/cpu2000/press/release.html
    [88] K. Olukotun, B. A. Nayfeh, L. Hammond et al. The Case for a Single-Chip Multiprocessor. In: Proceedings of the 7~(th) International Conference on Architecture Support for Programming Languages ad Operating Systems. Cambridge,Massachusetts, USA: ACM Press, 1996. 2-11
    [89] A. Agarwal, B. H. Lim, D. Kranz, and et al. APRIL: A Processor Architecture for Multiprocessing. In: Proceedings of the 17~(th) Annual International Symposium on Computer Architecture. Seattle, WA, USA: IEEE Computer Society, 1990. 104-114
    [90] J. A. Kahle, M. N. Day, H. P. Hofstee, and et al. Introduction to the Cell Multiprocessor. IBM Journal of Research and Development, 2005, 49 (4): 589-604
    [91] R. Kumar, K. Farkas, N. P. Jouppi, and et al. A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors. Proceedings of the Workshop on Complexity-Effective Design. San Diego, California, USA: IEEE Computer Society, 2003. 9-17
    [92] S. C. Woo, M. Ohara, E. Torrie, and et al. The SPLASH-2 Programs: Characterizations and Methodological Considerations. Proceedings of the 22~(nd)International Symposium on Computer Architecture. Santa Margherita Ligure, Italy:IEEE Computer Society, 1995. 24-36
    [93] R. Kumar, D. M. Tullsen, N. P. Jouppi. Core Architecture Optimization for Heterogeneous Chip Multiprocessors. Proceedings of the 15~(th) International Conference on Parallel Architectures and Compilation Teachniques. Seattle,Washington, USA: ACM Press, 2006. 23-32
    [94] R. Kumar, D. M. Tullsen, P. Ranganathan, and et al. Single-ISA Heterogeneous Multi-core Architecture for Multithreaded Workload Performance. Proceedings of the 31~(st) Annual International Symposium on Computer Architecture. Munchen,Germany: IEEE Computer Society, 2004: 64-75
    [95] A. Marowka .Parallel Computing on Any Desktop. Communications of the ACM,2007, 50 (9): 75-78
    [96] G. Hamerly, E. Perelman, J. Lau, and et al. Using Machine Learning to Guide Architecture Simulation. Journal of Machine Learning Research, 2006, 7: 343-378
    [97] http://www.surveysystem.com/sscalc.htm#ssneeded

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700