面向高性能计算的性能评价模型技术研究

英文题名：Research on Key Techniques of Performance Models for High Performace Computing
作者：陈永然
论文级别：博士
学科专业名称：计算机科学与技术
中文关键词：性能模型 ; 性能度量体系 ; 褶合方法 ; PB设计 ; 主成份分析 ; 系统抽样 ; 回归模型
英文关键词：Performance model ; Performance metrics ; Convolution method ; PB design ; Principal components analysis ; System sample ; Regression model
学位年度：2007
导师：窦文华
学科代码：081201
学位授予单位：国防科学技术大学
论文提交日期：2007-03-01

摘要

作为解决大规模计算问题的重要手段,高性能计算被越来越广泛地应用到科学与工程的各个领域。随着高性能计算的发展,高性能计算机规模的不断扩大,系统峰值性能得到迅速的提高。但是,应用程序获得的持续性能并未与峰值性能保持相同的增长速度,它们之间的差距越来越大。如何发现系统瓶颈,优化系统设计,提高系统持续性能是高性能计算研究领域中亟待解决的重点和难点问题。高性能计算中的性能评价技术是解决此类问题的一个有效途径和方法。
     由于机器体系结构和程序结构日益复杂,影响程序性能的因素越来越多,同时各种因素之间还存在着复杂的、非线性的交互作用,使面向高性能计算的性能评价面临着巨大的挑战。传统的性能评价方法由于自身特点已经不能满足这些复杂的大规模并行系统性能评价的需要,一种结合应用负载特征和机器体系结构特点的性能模型方法得到学术界和工业界的高度关注。这种方法独立分析应用的负载特征和机器的性能轮廓,并通过数学方法结合这两类参数评价系统的性能。本文围绕着“准确评价并行系统性能”这一根本目标,对高性能计算中的性能评价模型结构框架及其关键实现技术展开深入的研究。
     论文首先深入分析了国际上并行系统性能评价技术的研究现状和热点方法,重点研究了对并行系统性能评价有重要影响的研究项目,总结了它们的特点和不足。
     综合分析影响并行系统性能的众多复杂因素,针对一维空间性能度量尺度的不足,论文在多维空间上提出了并行系统性能度量体系,定义了基本性能指标AIM、SPF和SPMAO,给出了这些基本指标之间的距离和相似性关系,阐述了度量体系在并行系统性能分析中的作用。多维空间上的性能度量体系奠定了并行系统性能评价的基础,建立了实际并行系统到抽象数学空间的映射。
     论文分析了目前大部分并行系统性能评价模型特点,提出了一个并行系统性能模型框架PMPS(Performance Model of Parallel Systems)。该模型采用局部与整体相结合的层次式褶合方法,具有良好的可扩展性和开放性。
     为降低性能指标的维数,减少PMPS模型分析的复杂度。论文研究了处理器节点关键性能因素的提取技术,提出了一个有效的DoubleP方法。该方法将众多复杂的性能因素聚焦到几个性能主成份上,明确了分析的对象。通过DoubleP方法的分析,提取了14个影响处理器节点性能的关键因素和4个性能主成份。
     程序性能特征分析方法是PMPS模型中实现应用负载特征分析的主要手段,也是并行系统性能模型研究的难点问题之一。为实现程序性能特征的快速分析,论文提出了基于抽样的程序性能特征分析方法。与其它方法相比较,该方法在相同误差条件下有效减少了分析的指令数量,仅需抽样分析1%～3%的程序指令就能实现小于3%的分析误差。论文基于抽样方法实现了的程序性能特征分析器SamplePro。
     处理器节点性能模型是PMPS模型的重要组成。论文提出了一种基于多元线性回归的处理器节点性能求解方法和表示模型。该方法将性能因素和它们间的相关性转变成相互独立的一次预报变量,通过求解回归系数确定了程序中复杂的重叠操作时间和不同操作类型的权重。使用该方法构建的性能模型更加准确,误差分布均匀,不受处理器类型和负载特征的影响。
     通过本文的研究,实现了具有良好可扩展性和开放性的PMPS性能模型。该模型能准确评价各种并行应用在并行机器上的性能,可以有效发现并行系统的性能瓶颈,指导并行系统的设计、优化与升级。
High performance computing (HPC) is widely used in science and engineering to solve large computation problems. With the development of HPC, the scale of the high performance computers is expanded rapidly. Many new technologies and methods are introduced to improve the performance in the designing of the processor nodes. The peak performance of computers increases in a continuous and rapid way. But the sustained performance achieved by the real applications does not increase in the same scale as the peak performance does and the gap between them is widening. Performance evaluation of parallel systems, which is one of effective ways to solve this problem, can find the bottleneck of the system and guide the optimization of the system design.
     As the computer architectures and program structures are becoming much more complex, more and more factors may affect the performance of the programs. Furthermore, these factors interplay with each other in a complex and nonlinear way, which makes the performance evaluation of parallel systems a great challenge. Traditional performance evaluation methods cannot satisfy the need for performance evaluation of these massive parallel systems. Performance model which combines the application signatures and the machine profiles draws the attentions of the research community as well as the industry community. This method analyzes application signatures and machine profiles independently, and uses convolution methods to map an application's signature onto a machine profile to arrive at the performance prediction. Aiming at predicting the performance of the parallel applications exactly, we research on the performance model of parallel systems and key technologies.
     The dissertation thoroughly investigates the present status and hot points of researches on performance evaluation of parallel systems. Several important projects are analyzed, and their characteristic and short point are summarized.
     The dissertation considers all the factors that can influence the performance of the parallel system, and proposes a performance metric of the parallel systems on multidimensional space. The metric system defines basic performance metrics: Application Intrinsic Metrics (AIM), System Performance Functions(SPF) and System Performance Metrics Application Oriented (SPMAO) , and proposes the distance between these metrics and the similar relations among them. The performance metric on multidimensional space builds the theoretical basis of this dissertation. It set up a map from parallel systems to abstract mathematics space.
     Considering all the characteristic of most performance models of the most of the parallel systems, a novel performance model framework PMPS (Performance Model of Parallel Systems) based on the convolution method is proposed. This performance model has good scalability and extensibility which comes from the hierarchy convolution methods that combining the parts and the integer of the system.
     To decrease the dimensions of the performance metrics and reduce the complexity of the PMPS analysis model, a method named DoubleP is proposed to discover the key performance factors of the processor nodes. DoubleP can focus complex performance factors on several main components, so the analysis objects can be seen clearly. Using DoubleP, 14 key factors which can influence the performance of processor nodes and 4 main components of system's performance are found.
     The method to analyze programs profiles is the main means to study the application's signatures in PMPS, which is also a difficulty in the research of parallel systems performance. For analyzing the program profiles quickly, we proposed a method based on sampling. Compared with other methods, this technique can reduce the needed instruction numbers and shorten the analyze time of programs profiles on the same conditions that a certain sample error can be ensured, which means only 1%~3% instructions will be used when the error is less than 3%. Further more, a profiler named SamplePro base on sample theory is put forward and implemented in the dissertation.
     The performance model of processor nodes is the main part of the PMPS. The dissertation presents a performance model of processor nodes and its solving method based on regression. This method converts the performance factors and the relations between independent predicting variables, and obtains the weights of the complex and overlapped operations by determining the regression coefficients. The experiment results show the efficiency of regression method and the accuracy of the regression model, and cannot be influenced by the processor types and application signatures.
     The experiment results indicated that the PMPS performance model with good scalability can precisely predict the running time of all kinds of parallel applications in the parallel computers. It can also discover the performance bottlenecks of the parallel systems. The PMPS model can provide plenty of performance parameters and guiding information for designing, optimizing and upgrading the parallel computing systems.

引文

[1]TOP 500 Supercomputer Sites.http://www.top500.org,2006
    [2]David H Bailey,Bronis de Supinski,Jack Dongarra,Jeff Hollingsworth,Paul Hovland,Shirley Moore,BoyanaNorris,Dan Quinlan,Daniel Reed,Allan Snavely,Jeffrey Vetter,Patrick Worley.Performance Research..Current Status and Future Directions.http://perc.nersc.gov/main.htm,2005
    [3]Horst D Simon.The Grand Challenge Question for Performance Evaluation of HPC Systems.www.nersc.gov/～simon/Talks/PERF-workshop050503.pdf,2003
    [4]Lei Hu,Ian Gorton.Performance Evaluation for Parallel Systems:A Survey.Technical Report UNSW-CSE-TR-9707,University of NSW,Sydney,Australia,October 1997
    [5]Brewer E A,Dellarocas C N,Colbrook A,Weihl W E.PROTEUS:A high performance parallel-architecture simulator.Performance Evaluation Review,vol.20,no.1,pp247-248,June 1992
    [6]Jain R.The Art of Computer Systems Performance Analysis:Techniques for Experimental Design,Measurement,Simulation,and Modeling.John Wiley &Sons,New York,1991
    [7]Ferrari D,Serazzi G,Zeigner A.Measurement and Tuning of Computer systems.Prentice-Hall,Englewood Cliffs,N J,1983
    [8]Jeffrey S Vetter,Theresa L Windus,Brent Gorda.Performance Metrics for High End Computing.HECRTF White Paper,2003
    [9]Computing Research Association.The Roadmap for the Revitalization of High-End Computing,2003
    [10]National Coordination Offer for Information Technology Research and Development(NCO/IT R&D).Report of the High-End Computing Revitalization Task Force(HECRTF),2004
    [11]迟利华,刘杰,胡庆丰,李晓梅.基于并行Benchmark的高性能机实用测试与评价方法.长沙:计算机工程与科学,2004,26(4):45-47
    [12]Lo J,Egger S,Emer J,Levy H,Stamm R,Tullsen D.Converting Thread-Level Parallelism to Instruction-Level Parallelism via Simultaneous Multithreading.ACM Transactions on Computer Systems,August 1997
    [13]Gibson J,Kunz R,Ofelt D,Horowitz M.,Hennessy J,Heinrich M.FLASH vs.(Simulated) FLASH:Closing the Simulation Loop.Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 49-58, November 2000
    [14] RFujimoto. Parallel and Distributed Simulation Systems. Wiley Interscience,January, 2000
    [15] A Jacquet, V Janot, R Govindarajan, C Leung, G Gao, T Sterling. An Executable Analytical Performance Evaluation Approach for Early Performance Prediction. Proceedings of IPDPS03, 2003
    [16] A Hoisie, O Lubeck, H Wasserman. Performance and Scalability Analysis of Teraflop-Scale Parallel Architectures Using Multidimensional Wavefront Applications. The International Journal of High Performance Computing Applications, vol. 14, no. 4 , 2000
    [17] D J Kerbyson, H Alme , A Hoisie, F Petrini , H Wasserman, M Gittings. Predictive Performance and Scalability Modeling of a Large-Scale Application. Proceedings of SC2001, IEEE, November 2001
    [18] Darren J Kerbyson, Adolfy Hoisie, Shawn DPautz. Performance Modeling of Deterministic Transport Computations . Performance Analysis and Grid Computing, Kluwer: Dordrecht, 2003
    [19] Fabrizio Petrini, Darren J Kerbyson, Scott Pakin. The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8192 Processors of ASCI Q. Proc. SC2003, 2003
    [20] DJ Kerbyson, HJ Wasserman, A Hoisie. Exploring Advanced Architectures using Performance. Innovative Architecture for Future Generation High-Performance Processors and Systems, IEEE Computer Society Press, 2002

    [21] D J Kerbyson, H Alme , A Hoisie , F Petrini , H Wasserman , M Gittings. Predictive Performance and Scalability Modeling of a Large-Scale Application. Proceedings of SC2001, IEEE, Nov. 2001
    [22] Darren J Kerbyson, Adolfy Hoisie, Harvey J Wasserman. Verifying Large-Scale System Performance During Installation using Modeling. In L T Yang, Y Pan (eds) : in High Performance Scientific and Engineering Computing ,Hardware/Software Support, 143—156, October 2003
    [23] Adolfy Hoisie, OlafMLubek, Harvey J Wasserman. Performance Analysis of Wavefront Algorithms on Very-Large Scale Distributed Systems. In Lectures Notes in Control and Information Sciences, 249 171, 1999
    [24] M Mathis, D Kerbyson, A Hoisie. A Performance Model of Non-Deterministic Particle Transport on Large-scale Systems. Workshop on Performance Modeling and Analysis, ICCS, Melbourne, June 2003
    [25] David H Bailey, Allan Snavely. Performance Modeling: Understanding the Present and Predicting the Future. Lawrence Berkeley National Laboratory,Paper LBNL-57488. 2005
    [26] D J Kerbyson, H J Alme, A Hoisie, F Petrini, H J Wasserman, M L Gittings. Predictive Performance and Scalability Modeling of a Large-scale Application. in Proc. SC2001, Denver, CO, 2001
    [27] Carrington L, A Snavely, X Gao, N Wolter. A Performance Prediction Framework for Scientific Applications. ICCS Workshop on Performance Modeling and Analysis (PMA03), Melbourne, Australia, June 2003
    [28] Saavedra R H, Smith A J. Measuring Cache and TLB Performance and Their Effect on Benchmark Run Times. IEEE Transactions on Computers 44(10) 1223-1235, 1995
    [29] Saavedra R H, Smith A J. Analysis of Benchmark Characteristics and Benchmark Performance Prediction. TOCS14(4) 344-384, 1996
    [30] Saavedra R H, Smith A J. Performance Characterization of Optimizing Compilers. TSE21(7) 615-628, 1995
    [31] Saavedra R H, Smith A J, Machine Characterization Based on an Abstract High-Level Language Machine. IEEE Trans. On Comp. Vol. 38, No. 12,1659-1679, 1989.
    [32] Gee J, Hill M D, Pnevmatikatos D N, Smith A J. Cache Performance of SPEC Benchmark Suite. IEEE MICRO, 13, 4, 17-27, 1993
    [33] Mendes C L, Reed D A. Performance Stability and Prediction. IEEE USP International Workshop on High Performance Computing, 1994
    [34] Mendes C L. Performance Prediction by Trace Transformation. In Fifth Brazilian Symposium on Computer Architecture, 1993
    [35] Simon J, Wierum J M. Accurate Performance Prediction for Massively Parallel Systems and its Applications. Euro-Par, Vol. 2, 675-688, 1996
    [36] Simon J, WierumJM. Sequential Performance versus Scalability: Optimizing Parallel LU-decomposition. Proc of HPCN96 in Lecture Notes in Computer Science 1067, P627-632, 1996
    [37] Kapelnikov A, Muntz R R, Ercegevac M D. A Methodology for Performance Analysis of Parallel Computations with Looping Constructs. Journal of Parallel and Distributed Computing,14(2),1992
    [38]Hermann Mierendorff,Helmut Schwanborn,Maurizio Tazza.Performance Modeling of Grid Problems-A case Study on the SUPRENUM System.Parallel Computing20,pp1527-1546,1994
    [39]Crovella M E,LeBlanc T J.Parallel Performance Prediction Using Lost Cycles Analysis.SC1994,p600-609,1994
    [40]吴幸福.可扩展并行计算性能模型及其应用.北京航空航天大学,博士论文,1996
    [41]Dorr M R,Still C H.Concurrent Source Iteration in the Solution of Three-Dimensional Multi-group Discrete Ordinates Neutron Transport Equations.Technical Report UCRL-JC-116694,Revl,LLNL,Livermore,CA,1995
    [42]Baker R S,Koch K R.An Sn Algorithm for the Massively Parallel CM200Computer.Nucl.Sci and Eng.Vol128,No.3,P312,1998
    [43]Albert Alexandrov,Mihai F Lonescu,Klaus E Schauser,Chris Scheiman.LogGP..Incorporating Long Messages into the LogP Model One step closer towards a realistic model for parallel computation.Journal of Parallel and Distributed Computing,1995
    [44]Mark M Mathis,Darren J Kerbyson.A General Performance Model of Structured and Unstructured Mesh Particle Transport Computations.The Journal of Supercomputing 34(2):181-199,2005
    [45]Kevin J Barker,Scott Pakin,Darren J Kerbyson.A Performance Model of the KRAK Hydrodynamics Application.In Proceedings of the Int.Conf.on Parallel Processing(ICPP-06),Columbus OH,2006
    [46]Kevin J Barker,Darren J Kerbyson.A Performance Model and Scalability Analysis of the HYCOM Ocean Simulation Application.In Proceeding of the IASTEDPDCS 2005,Phoenix,AZ,2005
    [47]Snavely A,WolterN,CarringtonL,BadiaR,LabartaJ,PurkasthayaA.A Framework for Performance Modeling and Prediction.SC2002,2002
    [48]Carrington L,Wolter N,Snavely A.A Framework for Application Performance Prediction to Enable Scalability Understanding.Scaling to New Heights Workshop,Pittsburgh,May 2002
    [49]Snavely A,Wolter N,Carrington L.Modeling Application Performance by Convolving Machine Signatures with Application Profiles.IEEE 4th Annual Workshop onWorkloadCharacterization,Austin,Dec.2,2001
    [50]SnavelyA,GaoX,Lee C,CarringtonL,Wolter N,LabartaJ,GimenezJ,Jones P.Performance Modeling of HPC Applications.Parallel Computing,Dresden,Germany,2003
    [51]Carrington L,Wolter N,Snavely A,Lee C.Applying an Automated Framework to Produce Accurate Blind Performance Predictions of Full-Scale HPC Applications.UGC,2004
    [52]http://www.sdsc.edu/pmac
    [53]http://www.intel.com/cd/software/products/asmo-na/eng/cluster/mpi/219848.htm
    [54]http://www.cepba.upc.es/tools_i.html
    [55]陈永然,齐星云,窦文华.多维空间上并行系统性能度量体系.成都:计算机科学,2006,33(7),76-79
    [56]http://www.cs.inf.ethz.chcopssoftware
    [57]张尧庭,方开泰.多元统计分析引论.北京:科学出版社,2003
    [58]Giles J R.Introduction To The Analysis of Metric Spaces.Australian Mathematical Soc,Lecture Series3.Cambridge,MA:Cambridge Univ,1987
    [59]Sergi Girona Turell.Performance Prediction and Evaluation Tools.UPC,2003
    [60]Chen Yongran,Qi Xingyun,Qian Yue,Dou Wenhua.PMPS(3):A Performance Model of Parallel Systems,ACSAC 2006,LNCS 4186,pp344-350,2006
    [61]R Plackett,J Burman The Design of Optimum Multifactorial Experiments.Biometrika,Vol.33,Issue4,Pages305-325,1946
    [62]J Yi,D Lilja,D Hawkins.A Statistically Rigorous Approach for Improving Simulation Methodology.International Syposium on High-Performance Computer Architecture,2003
    [63]J Yi,D Lilja,D Hawkins.A Statistically Rigorous Approach for Improving Simulation Methodology.ARCTiC Technical Report 02-07,2002
    [64]D Lilja,Measuring Computer Performance.Cambridge University Press,2000
    [65]C F JeffWu,Michael Hamada著,张涧楚,郑海涛,兰燕,艾明要,林怡,杨贵军等译.试验设计与分析及参数优化.北京:中国统计出版社,2003
    [66]D C Montgomery.Design and Analysis of Experiments.Third Edition,Wiley,1991
    [67]Joshua J Yi,David J Lilja.Effects of Processor Parameter Selection on Simulation Results.MSI Resport 2002-146,2002
    [68]http://www.simplescalar.com/
    [69]http://www.spec.org/
    [70] http://rogue.colorado.edu/pin/

    [71] Joshua J Yi, Ajay Joshi, Resit Sendag, Lieven Eeckhout, David J Lilja. Analyzing the Processor Bottlenecks in SPEC CPU 2000. SPEC Benchmark Workshop, January 2006
    [72] P Bannon, Y Saito. The Alpha 21164PC Microprocessor. International Computer Conference, 1997
    [73] J Edmondson, P Rubinfeld, R Preston. Superscalar Instruction Execution in the 21164 Alpha Microprocessor. IEEE Micro, Vol15, No. 2, pp33-43, 1995
    [74] R Kessler, E Mclellan, D Webb. The Alpha 21264 Microprocessor Architecture. International Conference on Computer Design, 1998
    [75] R Kessler. The Alpha 21264 Microprocessor. IEEE Micro, Vol 19, No. 2,pp24-36, 1999
    [76] D Leiholz, R Razdan. The Alpha 21264: A 500MHz Out-of-Order Execution Microprocessor. International Computer Conference, 1997
    [77] M Maston, D Bailey, S Bell, L Biro, S Bulter, J Clouser, J.Farrell, M Gowan,D Priore, K Wilcox. Circuit Implementation of a 600MHz Superscalar RISC Microprocessor. International Conference on Computer Design, 1998
    [78] M Tremblay . UltraSparc I : A Four-Issue Processor Supporting Multimedia. IEEE Micro, Vol. 16, No. 2, pp42-50, 1996
    [79] K Normoyle, M Csoppenszky, A Tzeng, T Johnson, C Furman, J Mostoufi. UltraSPARC-II : Expanding the Boundaries of a System on a Chip. IEEE Micro, Vol. 18, No. 2, pp14-24, 1998
    [80] T Horel, G Lauterbach. UltraSPARC-III: Designing Third-Generation 64-Bit Performance. IEEE Micro, Vol. 19, No. 3, pp73-85, 1999
    [81] A Kumar. The HP PA-8000 RISC CPU.IEEE Micro, Vol. 19, No. 2, pp24-36,1999
    [82] S Song, MDenman, J Chang. The PowerPC604 RISC Microprocessor. IEEE Micro, Vol. 4, No. 5, pp8-17, 1994
    [83] K Yeager, The MIPS RIOOOO Superscalar Microprocessor. IEEE Micro, Vol. 16,No. 2, pp28-40, 1996
    [84] D Boggs, A Baktha, J Hawkins, J Miller, P Roussel, R Singhal, B SVenkatraman. The Microarchitecture of the Intel Pentium 4 Processor on 90nm Technology. Intel Technology Journal, Vol8, Isssue 1, 2004
    [85] A KleinOsowski, J Flynn, N Mears, D Lilja. Adapting the SPEC 2000 Benchmark Suite for Simulation-Based Computer Architecture Research.Workload Characterization of Emerging Computer Applications,pp83-100,2001
    [86]T Austin,E Larson,D Ernst.SimpleScalar..An Infrastructure for Computer System Modeling.IEEE Computer,35(2):59-67,2002
    [87]陈剑龙,傅忠传,崔刚.SimpleScalar模拟器内核分析与应用.哈尔滨:哈尔滨工业大学学报,36(5),652-654,2004
    [88]D Burger,T Austin.The SimpleScalar Tool Set,Version2.0.University of Wisconsin Computer Sciences Technical Report,1997
    [89]Chi Leung Luk,Robert Cohn,Robert Muth,Harish Patil,Artur Klauser,Geoff Lowney,Steven Wallace,Vijay JanapaReddi,Kim Hazelwood.Pin:Building Customized Program Analysis Tools with Dynamic Instrumentation.Programming Language Design and Implementation(PLDI),Chicago,IL,June,2005
    [90]黄铠,徐志伟著,陆鑫达,曾国荪,邓倩妮等译.可扩展并行计算:技术、结构与编程.北京:机械工业出版社,2000
    [91]Intel.Pin User Manual.http://rogue.colorado.edu/pin
    [92]Xiaofeng Gao.Reducing Time and Space Costs of Memory Tracing.University of California,San Diego,2005
    [93]Amitabh Srivastava,Alan Eustace.Atom:a System for Building Customized Program Analysis Tools.In PLDI'94:Proceedings of the ACM SIG-PLAN 1994conference on Programming Language Design and Implementation,pp196-205,ACM Press,1994
    [94]Bryan Buck,Jeffrey K Hollingswrth.An API for Runtime Code Patching.The International Journal of High Performance Computing Application,14(4):317-329,2000
    [95]A KleinOsowski,D Lilja.MinneSPEC:A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research.Computer Architecture Letters,Vol.1,2002
    [96]Joshua J Yi,Sreekumar V Kodakara,Resit Sendag,David J Lilja,Douglas M Hawkins.Characterzing and Comparing Prevaling Simulation techniques.In Proc.ACM HPCA,pp266-277,2005
    [97]T Sherwood et al.Automatically Characterizing Large Scale Program Behavior.International Conference on Architecture Support for Programming Languages and Operating Systems,2002
    [98]T M Conte,M A Hirsch,K N Menezes.Reducing state loss for effective trace sampling of superscalar processor.In Proceedings of the 1996 Intematioanl Conference on Computer Design(ICCD),1996
    [99]R Wunderlich et al,SMARTS:Accelerating Microarchitecture Simulation Via Rigorous Statistical Sampling.Intematioal Symposium on Computer Architecture,2003
    [100]T Lafage,A Seznec.Choosing Representative Slices of Program Execution for Microarchitecture Simulations:A Preliminary Application to the Data Stream.In IEEE Workshop on Workload Characterization,ICCD,2000
    [101]P S Levy,S Lemeshow.Sampling of Populations:Methods and Application.John Wiley & Sons,Inc,1999
    [102]《现代应用数学手册》编委会.现代应用数学手册·概率统计与随机过程卷.北京:清华大学出版社,2000
    [103]http://www.nas.nasa.gov/Resources/Software/npb.html
    [104]P Joseph,K Vaswani,and M J Thazhuthaveetil.Construction and use of linear regression models for processor performance analysis.In Proceedings of the 12th Symposium on High Performance Computer Architecture,Austin,Texas,February 2006
    [105]Rob F,Van der Wijngaart.NAS Parallel Benchmark I/O Version 2.4.NAS Technical Report NAS-03-002,2003.
    [106]T Stricker,C Kurmann.ETC memperf- Extended Copy Transfer Characterization.http://www.cs.inf.ethz.ch/CoPs/ETC/
    [107]http://en.wikipedia.org/wiki/Dhrystone
    [108]http://en.wikipedia.org/wiki/Whetstone
    [109]R Ross,D Nurmi,A Cheng,M Zingale.A Case Study in Application I/O on Linux Clusters.Proceedings of Supercomputing 2001,ACM/IEEE,2001
    [110]The Parallel Virtual File System.http://www.parl.clemson.edu/pvfs/index.html
    [111]P H Cams,W B Ligon Ⅲ,R B Ross,R Thakur.PVFS:A Parallel File System For Linux Clusters.Proceedings of the 4th Annual Linux Showcase and Conference,Atlanta,GA,pp.317-327,2000
    [112]ROMIO:A High-Performance Portable MPI-IO Implementation.http://www-unix.mcs.anl.gov/romio/.
    [113]Rajeev Thakur,William Gropp,Ewing Lusk.On implementing MPI-IO portably and with high performance.In Proceedings of the Sixth Workshop on Input/Output in Parallel and Distributed Systems, pages 23-32, 1999
    [114] Hierarchical Data Format 5. http://hdf.ncsa.uiuc.edu/
    [115] http://www.acl.lanl.gov/climate/models/pop/current_release/UsersGuide.pdf
    [116] Kotz D, Nieuwejaar N. Flexibility and Performance of Parallel File System. ACM Operating System Review, 30(2): 63-73, 1996

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700