OpenMP编译与优化技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
论文对OpenMP程序的编译和优化技术作了研究。
    论文的第一部分研究了OpenMP程序源代码级优化技术。优化的主要目的是将简单的fork-join类型的OpenMP程序转换为SPMD类型的OpenMP程序,从而以更高效的方式来表达程序中的并行性。主要的优化包括并行循环的调度参数优化,OpenMP程序中的并行块扩张与合并算法,以及在并行块扩张和合并算法基础上进行的冗余指导语句删除,特别是冗余同步的消除,及针对并行块的变量数据属性进行的优化。主要的贡献包括:
    提出了一种新的的并行循环调度参数优化算法。这种算法综合考虑了调度参数对OpenMP程序中各种开销的影响,特别的,这种算法考虑了后端优化对调度参数的要求,能更有效的防止不合适的调度参数所导致的性能退化现象。
    提出了一种新的并行块扩张与合并算法。这种算法具有两个不同于其它类似方法的特点:首先,它是一种积极扩张的算法,通过变量与计算私有化来处理合并中出现的变量数据属性冲突;其次,它可以跨越过程边界,进行跨过程边界的并行块提升。采用这种算法可以构成更大的并行区域,从而提供更多的优化机会。
    提出了对OpenMP程序中SPMD区域进行优化的新算法,包括对同步的优化,以及对变量数据属性的优化。前者减少了程序中冗余指导语句和同步操作带来的额外开销,而后者则以数据属性优化的方式,实现了私有变量的合并,这不仅减少了空间开销,也可以进一步开发存储器的局部性。
    论文第二部分研究了对OpenMP程序进行有效编译的方法。主要的贡献包括两个方面:
    提出了一种对OpenMP程序进行翻译和优化的框架,这种框架建立在对OpenMP指导语句的全局嵌套类型分析的基础上。采用这种方
    
    
    法可以对指导语句进行更有效的翻译与优化,它消除了部分额外开销,同时也改善了运行时库的性能。
    基于上面的分析和翻译框架,本文实现了一个IA64/Linux上的OpenMP编译与优化系统,以作为研究相关平台上高性能计算和开发线程级并行性的研究平台,同时也作为一个大的OpenMP开发环境的一部分。对它的测试表明,它具有较完整的功能,同时具有良好的性能,也证明了所提出的优化和翻译算法的有效性。
This dissertation focuses on the research of compilation and optimization techniques for OpenMP programs.
    This first part of this paper is about source level optimization techniques for OpenMP programs, with the main purpose of translating fork-join style OpenMP programs into SPMD style, to express the parallelism more efficiently. Main Optimizations include schedule parameter optimization for parallel loops, the parallel region expansion and mergence, and thus introduced redundant directives elimination, especially elimination of redundant synchronization operations, and variable’s data attribute oriented optimizations. Main contributions in this part arrive from the follows:
    A novel schedule parameter optimization algorithm for parallel loops is presented to determine a near-optimal schedule scheme by considering the impact of schedule parameter to different kinds of overhead in the program, especial the impact to backend optimization requirement. It thus can prevent performance degradation caused by improper schedule parameters more effectively.
    A new parallel region expansion and mergence algorithm is raised to form SPMD regions. Different from other methods, it gets two distinct features. First, it’s an aggressive algorithm in the way that it handles variable data attribute confliction through variable and computation privatization. Second, it can hoist parallel region across procedure boundaries. Larger parallel regions can be formed and thus more optimization opportunities are available.
    New optimization algorithms for SPMD regions in OpenMP programs are proposed to optimize the SPMD style programs, including synchronization optimization and variable data attribute optimization. The former reduces the overhead caused by redundant directives and synchronization operations
    
    
    elimination, and the latter reduces the spatial overhead and improves the locality by merging private variables in the form of variable data attribute optimization.
    The second part of this paper is about efficient compilation of OpenMP programs. Main contributions include:
    A translation and optimization framework for OpenMP programs is raised based on the global nesting type analysis for OpenMP directives. By this way, more effective translation and optimization can be done, and it eliminates part of the overhead and improves the performance of the Runtime Library.
    Based on the analysis and translation framework, an OpenMP compiling and optimization system is implemented on IA64/Linux, to provide a research vehicle for researches on related high performance computing and thread level parallelism exploration, and to be part of a larger OpenMP develop environment. Benchmarking results has proven its functionality and good performance, and also an exhibition of the effectiveness of aforementioned optimization and compilation algorithms.
引文
A. J. Dorta, J. A. Gonzalez, C. Rodriguez, F. de Sande. Towards Structured Parallel Programming. EWOMP12002.
    Achal Prabhakar, Vladimir Getov and Barbara Chapman. Performance Comparisons of Basic OpenMP Constructs. WOMPEI22002.
    Akira Nishida, Yoshio Oyanagi. Performance Evaluation of Low Level Multithreaded BLAS Kernels on Intel Processor based cc-NUMA Systems. WOMPEI2003.
    Alistair Rendell. OpenMP for Large Scale Computational Science Applications:?Experiences and Suggested Future Directions for OpenMP. WOMPAT32002.
    Aron Kneer. Industrial Mixed OpenMP/MPI CFD application for Practical Use in Free-surface Flow Calculations. WOMPAT2000.
    Ayon Basumallik, Seung-Jai Min and Rudolf Eigenmann. Towards OpenMP Execution on Distributed Virtual Shared Memory Systems. WOMPEI2002.
    B. Armstrong, S. Wook and R. Eigenmann. Quantifying Differences between OpenMP and MPI Using a Large-scale Application Suite. WOMPEI2000.
    Barbara Chapman, Amit Patil, Achal Prabhakar. Performance Oriented Programming for NUMA Architectures. WOMPAT2001.
    Beniamino Di Martino, Sergio Briguglio, Giuliana Fogaccia, Gregorio Vlad. Programming Shared Memory Architectures with OpenMP: a Case Study. EWOMP2000.
    Bernd Mohr, A. Mallony, H-C. Hoppe, F. Schlimbach, G. Haab, S. Shah. A Performance Monitoring Interface for OpenMP. EWOMP2002.
    Bernd Mohr, Allen D. Malony, Sameer Shende and Felix Wolf. Towards a Performance Tool Interface for OpenMP: An Approach Based on Directive Rewriting. EWOMP2001.
    Bernd Mohr. On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks. WOMPAT2002.
    Bronis R. De Supinski, Bor Chan. Towards an Integrated Parallel Microbenchmark Suite. WOMPAT2000.
    C. Amza, A.L. Cox, S. Dwarkadas, 1996. TreadMarks: Shared Memory Computing on Network of Workstations. IEEE Computer, 1996, Vol.29, No. 2, pp18-28.
    C. Calonaci, P. Malfetti, S. Campagna, P. Faggian, D. Ronzio. Parallelization of the weather forecast code Mephysto. EWOMP2002.
    
    C.Ierothou, S.Johnson, P.Leggett, M.Cross, E.Evans. The Automatic Parallelization of Scientific Application codes using a Computer Aided Parallelization Toolkit. WOMPAT2000.
    Charles D. Norton, Dinshaw S. Balsara. OpenMP-based Frameworks for Interoperable Structured Adaptive Methods. WOMPAT2000.
    ChauWen Tseng. Compiler Optimizations for Eliminating Barrier Synchronization. In Proceedings of the 5th ACM Symposium on Principles and Practice of Parallel Programming, 1995.
    Chen Yongjian, Wang Dingxing, Zheng Weimin. Extended Overhead Analysis for OpenMP Performance Tuning. WOMPAT2003.
    Clay P. Breshears, Phu Luong. Comparison of OpenMP and Pthreads within a Coastal Ocean Circulation Model Code. WOMPAT2000.
    Cliff Addison and Yuhe Ren. OpenMP Issues Arising in the Development of Parallel BLAS and LAPACK Libraries. EWOMP2001.
    D. an Mey, T. Haarmann, W. Koschel. Pushing Loop-Level Parallelization to the Limit. EWOMP2002.
    D. M. Pressel. The Scalability of Loop-Level Parallelism. WOMPAT2000.
    D. Nikolopoulos, T. Papatheodorou, C. Polychronopoulos, J. Labarta and E. Ayguade. Leveraging Transparent Data Distribution in OpenMP via User-level Dynamic Page Migration. WOMPEI2000.
    D. S. Nicolopoulos, Eduard Ayguade, et al., 2000. Leveraging Transparent Data Distribution in OpenMP via User-Level Dynamic Page migration. WOMPAT 2000, st. Diego, USA.
    D.J. Lee, T.J. Downar. The Application of POSIX Threads and OpenMP to the U.S. NRC Neutron Kinetics Code PARCS. WOMPAT2001.
    Daisuke Takahashi, Mitsuhisa Sato and Taisuke Boku. Performance Evaluation of the Hitachi SR8000 Using OpenMP Benchmarks. WOMPEI2002.
    Daisuke Takahashi, Mitsuhisa Sato, Taisuke Boku. An OpenMP Implementation of Parallel FFT and Its Performance on IA-64 Processors. WOMPAT2003.
    Dan Quinlan, Markus Schordan, Qing Yi, et al. A C++ Infrastructure for Automatic Introduction and Translation of OpenMP Directives. WOMPAT2003.
    David B. Skillicorn,Domenico Talia,1998. Models and Languages for Parallel Computation. ACM Computing Survey,V30 N2,pp. 123-169(June 1998).
    
    David F. Bacon,Susan L. Graham,Oliver J. Sharp,1994. Compiler Transformations for High-Performance computing. ACM Computing Surver,V26 N4,pp.345-418(December 1994).
    David Kuck, 2000. OpenMP Overview. WOMPAT, 2000.
    Dibyendu Das. Portable Extensions to OpenMP for 2-level Nested Parallelization in cc-NUMA and SMP Clusters. EWOMP2001.
    Dick Grune, Henri E. Bal, Ceriel J. H. Jacobs, and Koen G. Langendoen. Modern compiler design. John Wiley and Sons, New York. 2000.
    Diego Novillo, Ronald C. Unrau, and Jonathan Schaeffer. Optimizing Mutual Exclusion Synchronization in Explicitly Parallel Programs. In Proceeding of Languages, Compilers, and Run-Time Systems for Scalable Computers, 2000. pp. 128-142.
    Dieter an May, Stephan Schmidt. From a Vector Computer to an SMP-Cluster - Hybrid Parallelization of the CFD Code PANTA. EWOMP2000.
    Dimitrios Nikolopoulos, Ernest Artiaga, Eduard Ayguade and Jesus Labarta. Exploiting Memory Affinity in OpenMP through Schedule Reuse. EWOMP2001.
    Dimitrios S. Nikolopoulos, Eduard Ayguadé. A Study of Implicit Data Distribution Methods for OpenMP Using the SPEC Benchmarks. WOMPAT2001.
    Dmitry Pekurovsky. OpenMP microbenchmarking study on IBM SP High Nodes. WOMPAT2000.
    E. Artiaga, N. Navarro, E. Ayguade, J. Labarta. Dynamic Loop Schedulers and Memory Behavior. EWOMP2002.
    E. D'Azevedo, B. Peyton, C. Romine, C. Yang. Parallel solver for block tridiagonal systems. WOMPAT2000.
    E. Su, X. Tian, M. Girkar, H. Grant, S. Shah, P. Peterson. Compiler Support of the Workqueuing Execution Model for Intel SMP Architectures. EWOMP2002.
    Eduard Ayguadé, Bob Blainey, Alejandro Duran, et al. Is the Schedule Clause Really Necessary in OpenMP? . WOMPAT2003.
    Eduard Ayguade, Mats Brorsson, Holger Brunst, Hans-Christian Hoppe, Sven Karlsson, Xavier Martorell, Wolfgang Nagel, Frank Schlimbach, Gladys Utrera and Manuela Winkler. OpenMP Performance Analysis Approach in the INTONE Project. EWOMP2001.
    F. Massaioli, G. Amati. Achieving high performance in a LBM code using OpenMP. EWOMP2002.
    
    Fabrice Mathey, Philippe Kloos, Philippe Blaise. OpenMP Optimisation of a Parallel MPI CFD Code. EWOMP2000.
    Feng Liu, Vipin Chaudhary. A Practical OpenMP Compiler for System on Chips. WOMPAT2003.
    G. Bella, S. Filippone, N. Rossi, S. Ubertini. Using OpenMP on a Hydrodynamic Lattice-Boltzmann Code. EWOMP2002.
    G. Gazzaniga, P. Lanucara, P. Pietra, S. Rovida, G. Sacchi. Rapid parallelization of the drift-diffusion model for semiconductor devices. EWOMP2002.
    Gabriele Jost. Experiences using OpenMP based on Compiler Directed Software DSM on a PC cluster. WOMPAT2002.
    George Almasi, Eduard Ayguadé, Calin Cascaval, et al. Evaluation of OpenMP for the Cyclops Multithreaded Architecture. WOMPAT2003.
    Geraud Krawezik, Guillaume Alleon and Franck Cappello. SPMD OpenMP versus MPI on a IBM SMP for 3 kernels of the NAS Benchmarks. WOMPEI2002.
    Gregory R. Andrews. Foundations of Multithreaded, Parallel, and Distributed Programming. Pearson Education, 2000.
    Guansong Zhang, Francisco Martínez, Arie Tal, et al. Busy-Wait Barrier Synchronization Using Distributed Counters with Local Sensor. WOMPAT2003.
    H. Jin, M. Frumkin and J. Yan. Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamic Codes. WOMPEI2000.
    Harold S. Stone, 1993. High-Performance Computation Architecture. Addison-wesley, 1993.
    Harvey G. Cragon. Memory Systems and Pipelined Processors. Jones and Bartlett Publishers, Sudbury, Massachussetts. 1996.
    Henry Jin, Gabriele Jost, Jerry Yan, Eduard Ayguade, Marc Gonzalez and Xavier Martorell. Automatic Multilevel Parallelization Using OpenMP. EWOMP2001.
    Hidetoshi Iwashita, Eiji Yamanaka, Naoki Sueyasu, Matthijs van Waveren and Ken Miura. The SPEC OMP2001 Benchmark on the Fujitsu PRIMEPOWER System. EWOMP2001.
    Hidetoshi Iwashita, Masanori Kaneko, Masaki Aoki, Kohichiro Hotta, Matthijs van Waveren. On the Implementation of OpenMP 2.0 Extensions in the Fujitsu PRIMEPOWER Compiler. WOMPEI2003.
    
    Hirofumi Nakano, Kazuhisa Ishizaka, Motoki Obata, Keiji Kimura and Hironori Kasahara. Static Coarse Grain Task Scheduling with Cache Optimization Using OpenMP. WOMPEI2002.
    Hong-Linh Truong, Thomas Fahringer. SCALEA: A Performance Analysis Tool for Distributed and Parallel Programs. In Proc. of the 8th International EuroPar Conf., Paderborn, Germany, August 2002.
    http://ipf-orc.sourceforge.net/
    http://www.hpl.hp.com/research/linux/perfmon/
    Hu Weiwu, Shi Weisong, Tang Zhimin, 1999. Software DSM system based on a new cache coherence protocol. Jisuanji Xuebao/Chinese Journal of Computers, May 1999, Vol.22, No.5, pp467-475.
    Intel. Intel Itanium 2 Processor Reference Manual for Software Development and Optimization. June 2002.
    J. A. Gonzalez, C. Leon, C. Rodriguez and F. Sande. Exploiting Nested Independent FORALL Loops in Distributed Memory Machines. EWOMP2001.
    J. Mark Bull and Darragh O'Neill. A Microbenchmark Suite for OpenMP 2.0. EWOMP2001.
    J.M. Bull. A Hierarchical Classification of Overheads in Parallel Programs, Proceedings of First IFIP TC10 International Workshop on Software Engineering for Parallel and Distributed Systems, I. Jelly, I. Gorton and P. Croll (eds), Chapman Hall, pp. 208-219, March 1996.
    J.M. Bull. Measuring Synchronization and Scheduling Overheads in OpenMP. First European Workshop on OpenMP (EWOMP ’99) , September 30-October 1st 1999.
    Jaegeun Oh, Seon Wook Kim, Chulwoo Kim. OpenMP and Compilation Issue in Embedded Applications. WOMPAT2003.
    James Cownie, John Del Signore, Bronis R. de Supinski, et al. DMPL: An OpenMP DLL Debugging Interface. WOMPAT2003.
    James Cownie, Shirley Browne. Portable OpenMP debugging with TotalView. WOMPAT2000.
    James R. Taft. Achieving 60 GFLOP/S on the Production CFD Code OverFlow-MLP. WOMPAT2000.
    Jay Hoeflinger, Bob Kuhn, Wolfgang Nagel, et al. An Integrated Performance Visualizer for MPI/OpenMP Programs. WOMPAT2001.
    
    Jean-Yves Berthou, Eric Faucher, Eric Fayolle, Laurent Scliffet. Defining the Best Parallelization Strategy for a Diphasic Compressible Fluid Mechanics Code. EWOMP2000.
    Jens Gerlach, Zheng-Yu Jiang, Hans-Werner Pohl. Integrating OpenMP into Janus. WOMPAT2001.
    Jesus Labarta, Eduard Ayguadé, José Oliver, David Henty. New OpenMP Directives for Irregular Data Access Loops. EWOMP2000.
    John Bircsak, Peter Craig, RaeLyn Crowell, Jonathan Harris, Alex Nelson, Carl Offner.Extending OpenMP for NUMA Architectures. WOMPAT2000.
    John Merlin. Distributed OpenMP: Extensions to OpenMP for SMP Clusters. EWOMP2000.
    John Paul Shen and Mikko H. Lipasti. Modern Processor Design: Fundamentals of Superscalar Processors. McGraw-Hill Higher Education. 2003.
    Jonathan Harris, 2000. Extending OpenMP for NUMA Systems. EWOMP, 2000.
    Jonathan Harris, Peter W. Craig, RaeLyn Crowell, C. Alexander Nelson and Carl D. Offner. Experience with Data Placement Extensions to OpenMP for NUMA Architectures. EWOMP2001.
    Jordi Caubet, Judit Gimenez, Jesus Labarta, et al. A Dynamic Tracing Mechanism for Performance Analysis of OpenMP Applications. WOMPAT2001.
    Jose L. Gordillo, Javier Vitela, Lucila M. Cortina and Ulf R. Hanebutte. Comparative study of message passing and shared memory models for parallel training of artificial neural network. WOMPAT2000.
    K. Ishizaka, M. Obata and H. Kasahara. Coarse-grain Task Parallel Processing Using the OpenMP Backend of the OSCAR Multigrain Parallelizing Compiler. WOMPEI2000.
    K. Kusano, S. Satoh and M. Sato. Performance Evaluation of the Omni OpenMP Compiler. WOMPEI2000.
    Kazuhiro Kusano, Mitsuhisa Sato, Takeo Hosomi, et al. The Omni OpenMP Compiler on the Distributed Shared Memory of Cenju-4. WOMPAT2001.
    Keith A. Faigin,Jay P. Hoeflinger,David A. Padua,Paul M. Petersen,Stephen A. Weatherford,1994. Polaris: An Automatic Parallelization Compiler. Document of Polaris.
    Kengo Nakajima and Hiroshi Okuda. Parallel Iterative Solvers for Unstructured Grids using an OpenMP/MPI Hybrid Programming Model for the GeoFEM Platform on SMP Cluster Architectures. WOMPEI2002.
    
    Kengo Nakajima. OpenMP/MPI Hybrid vs. Flat MPI on the Earth Simulator: Parallel Iterative Solvers for Finite Element Method. WOMPEI2003.
    Larry Meadows. Extending OpenMP to Improve Scalability for Numerical Codes. WOMPAT2002.
    Larry Meadows. OpenMP on Sparc Solaris: Compilers, Tools, and Performance. EWOMP2000.
    Lorna Smith, Mark Bull, 2000. Development of Mixed Mode MPI/ OpneMP Applications. WOMPAT 2000.
    M. Bull, C. Johnson. Data Distribution, Migration and Replication on a cc-NUMA Architecture. EWOMP2002.
    M. Gonzalez, E. Ayguadé, X. Martorell, et al. Defining and Supporting Pipelined Executions in OpenMP. WOMPAT2001.
    Makoto Satoh, Yuichiro Aoki, Kiyomi Wada, Takayoshi Iitsuka, Sumio Kikuchi. Interprocedural Parallelizing Compiler WPP and Analysis Information Visualization Tool Aivi. EWOMP2000.
    Marc Gonzalez, Eduard Ayguade, Xavier Martorell, Jesus Labarta and Phu V. Luong. Dual-level Parallelism Exploitation with OpenMP in Coastal Ocean Circulation Modeling. WOMPEI2002.
    Marc Gonzalez, José Oliver, Xavier Martorell, Eduard Ayguadé, Jesus Labarta, Nacho Navarro. Precedence Relations in the OpenMP Programming Model. EWOMP2000.
    Mark Bull, Martin Westhead, et al. Towards OpenMP for Java. EWOMP2000.
    Mark Kremenetsky and Edward Rothberg. A Simple Approach to Moderately Parallel SSOR and ILU0 Preconditioning for Iterative Solution of Linear Systems. EWOMP2001.
    Marty Itzkowitz, Nawal Copty, Wayne Hui. Sun's Compilers and Tools for Understanding OpenMP Performance. WOMPAT2000.
    Mats Brorsson . Intone - Tools and Environments for OpenMP on Clusters of SMPs. WOMPAT2000.
    Matthias Hess, Gabriele Jost, Matthias Müller, et al. Experiences Using OpenMP Based on Compiler Directed Software DSM on a PC Cluster. WOMPAT2003.
    Matthias Muller. OpenMP Optimization Techniques: Comparison of Fortran and C Compilers. EWOMP2001.
    Matthias Müller. Some Simple OpenMP Optimization Techniques. WOMPAT2001.
    Matthias S. Muller. A Shared Memory Benchmark in OpenMP. WOMPEI2002.
    
    Maurice V. Wilkes, 2001. The memory gap and the Future of High Performance Memories. Computer Architecture News, Vol. 29, No. 1.
    Michael Bane, Graham Riley, 2000. Automatic overheads profiler for OpenMP codes.WOMPAT 2000.
    Michael E. Wolf,Monica S. Lam,1991. A Loop Transformation Theory and an Algorithm to Maximize Parallelism. IEEE Transactions on Parallel and Distributed Systems,Vol.2, No. 4, pp. 452-471.
    Michael K. Bane. Extended Overhead Analysis for OpenMP. In Proc. of the 8th International EuroPar Conf., Paderborn, Germany, August 2002.
    Michael Wolf and Monica Lam. A data Locality Optimization Algorithm. Proceedings of the ACM SIGPLAN’91 Conference on Programming Language Design and Implementation. 1994, Vol. 26, No. 6, pp. 30-44.
    Micheal E. Wolf,Dror E. Maydan,Ding-Kai Chen. Combining Loop Transformation Considering Caches and Scheduling. International Journal of Parallel Programming. 1998, Vol. 26, No. 4.
    Mihai Burcea, Michael J. Voss. A Runtime Optimization System for OpenMP. WOMPAT2003.
    Mitsuhisa Sato, Hiroshi Harada and Yutaka Ishikawa. OpenMP compiler for a Software Distributed Shared Memory System SCASH. WOMPAT2000.
    Mitsuhisa Sato, Kazuhiro Kusano, Sigehisa Satohx. OpenMP Benchmark using PARKBENCH. WOMPAT 2000.
    Mitsuhisa Sato, Motonari Hirano, Yoshio Tanaka, et al. OmniRPC: A Grid RPC Facility for Cluster and Global Computing in OpenMP: (Extended Abstract). WOMPAT2001.
    Motonori Hirano, Mitsuhisa Sato, Yoshio Tanaka. OpenGR: A Directive-based Grid Programming Environment. WOMPEI2003.
    N. Sueyasu, H. Iwashita, K. Hotta, M. van Waveren, K. Miura. Scalability of SPEC OMP on Fujitsu PRIMEPOWER. EWOMP2002.
    Nils Smeds. OpenMP application tuning using hardware performance counters. WOMPAT2002.
    Nils Smeds. OpenMP Application Tuning Using Hardware Performance Counters. WOMPAT2003.
    O. Tatebe, M. Sato and S. Sekiguchi. Impact of OpenMP Optimizations for the MGCG Method. WOMPEI2000.
    OpenMP Forum, 2000. OpenMP Fortran Application program Interface. Version2.0.
    
    P. Clauss. Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: applications to analyze and transform scientific programs. Proceedings of the 10th Int. Conf on Supercomputing, pp. 278-285, May 1996.
    Panagiotis E. Hadjidoukas, Eleftherios D. Polychronopoulos, andTheodore S. Papatheodorou. OpenMP Runtime Support for Clusters of Multiprocessors. WOMPAT2003.
    Panagiotis E. Hadjidoukas, Eleftherios D. Polychronopoulos, Theodore S. Papatheodorou. OpenMP for Adaptive Master-Slave Message Passing Applications. WOMPEI2003.
    Paolo Malfetti. Application of OpenMP to Weather, Wave and Ocean Codes. EWOMP2000.
    Paul Petersen, Sanjiv Shah. OpenMP Support in the Inteltex2html_wrap_inline98 Thread Checker. WOMPAT2003.
    Peter Jamieson, Angelos Bilas. CableS : Thread Control and Memory System Extensions for Shared Virtual Memory Clusters. WOMPAT2001.
    Philippe Kloos, Fabrice Mathey, Philippe Blaise. OpenMP and MPI programming with a CG algorithm. EWOMP2000.
    Pierre de Montleau, Jose Maria Cela, Serge Moto Mpong and Andre Godinass. A parallel computing model for the acceleration of a finite element software. WOMPEI2002.
    Pierre Delisle. Michael Krajecki, Marc Gravel and Caroline Gagne. Parallel Implementation of an Ant Colony Optimization Methaheuristic with OpenMP. EWOMP2001.
    R. Blikberg. Parallelizing AMRCLAW by nesting techniques. EWOMP2002.
    R. Lario, C. Garcia, M. Prieto and F. Tirado. Rapid Parallelization of a Multilevel Cloth Simulator Using OpenMP. EWOMP2001.
    R. Rabenseifner, G. Wellein. Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures. EWOMP2002.
    Ragnhild Blikberg, Tor S?revik. Nested Parallelism: Allocation of Processors to Tasks and OpenMP Implementations. EWOMP2000.
    Rajat P. Garg, Sanjay Goil, Darryl Gove, Myungho Lee, Dominic Paulraj and Brian Whitney. The SPEC OMPM2001 Benchmarks on the Sun UltraSPARC Multiprocessors. WOMPAT2002.
    Rajat P. Garg, Sanjay Goil, Darryl Gove, Myungho Lee, Dominic Paulraj and Brian Whitney. ompChecker: A Static Correctness Checking Tool for openMP? Programs. WOMPAT2002.
    
    Randy Allen and Ken Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers, San Francisco, California. 2002.
    Rene Kobler, Dieter Kranzlmüller, Jens Volkert. Debugging OpenMP Programs Using Event Manipulation. WOMPAT2001.
    Rob Baxter, Paul Graham, Michael Bowers, John Mould, Greg Wojcik, Dave Vaughan. Rapid Parallelisation of the Industrial Modelling Code PZFlex. EWOMP2000.
    Rocco Aversa, Beniamino Di Martino, Nicola Mazzocca, Massimiliano Rak and Salvatore Venticinque. Integration of Mobile Agents and OpenMP for Programming Clusters of Shared Memory Processors: A Case Study. EWOMP2001.
    Roger W. Hockney. The Science of Computer Benchmarking. The Society for Industrial and Applied Mathematics, Philadelphia. 1996.
    Rolf Rabenseifner. Communication Bandwidth of Parallel Programming Models on Hybrid Architectures. WOMPEI2002.
    Rudi Eigenmann, Michael Voss, Brian Armstrong. OpenMP Tools and Benchmarks. WOMPAT2000.
    S.P. VanderWiel, D.J. Lilja, 2000. Data Prefetch Mechanism. ACM Computing Surveys, Vol. 32, No. 2, June 2000.
    Sanjiv Shah, 2000. OpenMP Fortran 2.0 and Beyond. EWOMP, 2000.
    Seon Wook Kim, Hyeong Soo Chang. Parallelizing Parallel Rollout Algorithm for Solving Markov Decision Processes. WOMPAT2003.
    Seung Jai Min, Ayon Basumallik, Rudolf Eigenmann. Supporting Realistic OpenMP Applications on a Commodity Cluster of Workstations. WOMPAT2003.
    Seung Jai Min, Seon Wook Kim, Michael Voss, et al. Portable Compilers for OpenMP. WOMPAT2001.
    Shigehisa Satoh, Kazuhiro Kusano, Mitsuhisa Sato. Compiler Optimization Techniques for OpenMP Programs. EWOMP2000.
    Shi-Jung Kao. Managing C++ OpenMP Code and its Exception Handling. WOMPAT2002.
    Shi-Jung Kao. Managing C++ OpenMP Code and Its Exception Handling. WOMPAT2003.
    So-Hee Park, Mi-Young Park, Yong-Kee Jun. A Comparison of Scalable Labeling Schemes for Detecting Races in OpenMP Programs. WOMPAT2001.
    
    Somnath Ghosh, et al. Cache Miss Equations: A Compiler Framework for Analyzing and Tuning Memory Behavior. ACM Transactions on Programming Languages and Systems, Vol. 21, No. 4, pp. 703-746 , July 1999
    Steven. S. Muchnick. Advanced Compiler Design and Implementation. Morgan Kaufmann Publishers, San Francisco, California. 1997.
    T. Fahringer, M. Gernt, G. Riley and J. Larsson. Formalizing OpenMP Performance Properties with ASL. WOMPEI2000.
    Taisuke Boku, Shigehiro Yoshikawa, Mitsuhisa Sato, Carol G. Hoover and William G. Hoove. Implementation and Performance Evaluation of SPAM Particle Code with OpenMP-MPI Hybrid Programming. EWOMP2001.
    Tatebe, M. Sato and S. Sekiguchi, 2000. Impact of OpenMP Optimizations for the MGCG Method. WOMPEI 2000.
    The OpenMP ARB. OpenMP Fortran Application Program Interface, Version 2.0.November 2000, http://www.openmp.org/.
    Timothy H. Kaiser, 2000. Course grain parallelism in OpenMP with MPI. WOMPAT 2000.
    Tor E. Jeremiassen, Susan J. Eggers. Reducing False Sharing on Shared Memory Multiprocessors through Compile Time data Transformations. Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 179-188, 1995.
    Utpal Banerjee. Loop Transformations for Restructuring Compilers: The Foundations. Kluwer Academic Publishers,1993.
    Utpal Banerjee,Rudolf Eigenmann,Alexandru Nicolau,David A. Padua. Automatic Program Parallelization. Proceddings of the IEEE,81:211-247
    Vasileios K. Barekas, Panagiotis E. Hadjidoukas, Elefterios D. Polychronopoulos and Theodore S. Papatheodorou. An OpenMP Implementation for Multiprogrammed SMPs. EWOMP2001.
    Vincent J. Schuster, Douglas Miles. Distributed OpenMP, Extensions to OpenMP for SMP Clusters. WOMPAT2000.
    Vishal Aslot and Rudi Eigenmann. Performance Characteristics of the SPEC OMP2001 Benchmarks. EWOMP2001.
    Vishal Aslot, Max Domeika, Rudolf Eigenmann, et al. SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance. WOMPAT2001.
    
    Y. Nishitani, K. Negishi, E. Nunohiro and H. Ohta. Implementation and Evaluation of OpenMP for Hitachi SR8000. WOMPEI2000.
    Y. Sakae, S. Matsuoka, M. Sato, H. Harada. Towards Dynamic Load Balancing Using Page Migration and Loop Re-partitioning on Omni/SCASH. EWOMP2002.
    Yun He and Chris?H.Q. Ding. Mixed MPI and OpenMP Implementation in an In-place Vacancy Tracking Array Transpose Method. WOMPAT2002.
    Yun He, Chris H.Q. Ding. An Evaluation of MPI and OpenMP Paradigms for Multi-Dimensional Data Remapping. WOMPAT2003.
    Z. Habbas, M. Krajecki, D. Singer. Parallelizing Combinatorial Search with a Shared Memory. EWOMP2002.
    Zhenying Liu, Barbara Chapman, Tien-Hsiung Weng, et al. Improving the Performance of OpenMP by Array Privatization. WOMPAT2003.
    Zhenying Liu, Barbara Chapman, Yi Wen, et al. Analyses for the Translation of OpenMP Codes into SPMD Style with Array Privatization. WOMPAT2003.
    Zhenying Liu, Barbara Chapman. Improving the Performance of OpenMP by Array Privatization. WOMPAT2002.
    Zineb Habbas, Daniel Singer, Micha?l Krajecki. Domain Decomposition for Parallel Resolution of of Constraint Satisfaction Problems with OpenMP. EWOMP2000.
    Brunschen C, Brorsson M. OdinMP/CCp-a portable implementation of OpenMP for C. Concurrency: Practice and Experience, 2000, 12: 1193-1203.
    Geraud Krawezik, Franck Cappello. Performance Comparison of MPI and three OpenMP programming Styles on Shared Memory Multiprocessors. SPAA2003.
    Niranjan Ghate, Vipin Chaudhary. Optimizing Automatically Generated Programs For a Software Distributed Shared Memory System. In the Proceedings of 14th IASTED, 2002. 735-74.
    R. HB. Netzer, B.P. Miller. What are Race Conditions? –Some Issues and Formalizations. In ACM Letters on Programming Languages and Systems. Vol. 1, Issue 1, 1992. 74-88.
    陈文光。交互式并行化与通信优化技术研究。清华大学博士学位论文,1999年6月。
    吴少刚。 机群系统OpenMP研究。中国科学院研究生院博士学位论文,2003年11月。
    杨博。交互式并行化与面向分布存储的编译技术研究。清华大学博士学位论文, 2001年6月。
    钟洪涛。面向分布存储系统并行化编译器的MPI后端的设计与实现。清华大学硕士学位论文,2002年5月。
    
    
    1 EWOMP: The European Workshop on OpenMP
    2 WOMPEI: The International Workshop on OpenMP: Experiences and Implementations
    3 WOMPAT: Workshop on OpenMP Applications and Tools
NGLC 2004-2010.National Geological Library of China All Rights Reserved.
Add:29 Xueyuan Rd,Haidian District,Beijing,PRC. Mail Add: 8324 mailbox 100083
For exchange or info please contact us via email.