基于多核集群的并行离散事件仿真性能优化技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着仿真应用的不断深入,系统仿真规模越来越大,个体模型复杂度越来越高,使得仿真系统对计算资源的需求不断增加,如何缩短大规模仿真系统的运行时间,提高仿真应用的效率,是并行离散事件仿真领域目前研究的热点问题。而随着多核计算革命的兴起,通用的多核CPU并行方式正成为当前主流的发展趋势。多核CPU同一片上的多个核之间的低通信延迟能够提供可观的性能潜力。然而,目前在多核集群上运行的仿真系统由于大多沿用以往传统集群上的并行仿真技术,虽然能取得一定的并行加速比,但仍然难以充分发挥多核处理器的性能潜力,因此,开展基于多核集群的并行离散事件仿真性能优化技术研究,对于充分发挥多核处理器的性能潜力,提高仿真应用的运行效率,促进我国大规模仿真应用的发展等具有十分重要的理论和实践意义。
     论文针对当前并行离散事件仿真系统难以有效利用多核计算资源的问题,以进一步提高并行离散事件仿真应用运行效率为根本目标,对时间管理算法、负载平衡、共享状态属性访问机制、通信优化等影响并行离散事件仿真性能的关键问题进行了深入研究,主要工作和创新点如下:
     (1)提出了支持多核多线程与MPI相结合并行模式的多核集群并行仿真时间管理机制。仿真时间同步是决定并行离散事件仿真运行性能的核心因素。多线程并行调度方式能够充分发挥多核处理器共享内存地址空间低通信延迟的优势,但目前的仿真引擎在多核多线程与MPI相结合的并行模式方面缺乏高效的同步支持。论文针对此问题,在深入分析并行离散事件仿真多线程并行调度机制和分布式同步算法的基础上,提出了支持多核多线程与MPI相结合并行模式的多核集群并行仿真时间管理机制,该机制采用经修改的Mattern算法以适应多线程和MPI异构的并行方式,通过一个有限状态机对每个消息事件的生命周期及状态转换过程进行管理,通过设计无锁的事件状态修改机制来避免锁开销。实验结果表明,在合理的运行配置下,所提出的并行仿真时间管理机制在多核集群上随计算节点数目的增加表现出良好的并行加速比;当仿真实体的核间交互概率达到40%以上时,相对多进程MPI并行方式的加速比可达到1.8左右。
     (2)提出了并行离散事件仿真系统在多核处理器上自动负载平衡的事件调度机制。在并行离散事件仿真系统中,负载平衡影响着同步和通信开销,从而影响着整个系统的运行性能。当前的并行仿真动态负载平衡机制很难达到事件调度开销小和负载平衡能力强兼得的目标。论文针对此问题,提出了一种基于分布式队列的全局调度机制,该机制通过全局调度方式来达到动态负载平衡,为降低全局调度开销,该机制设计了分布式的事件队列结构和无锁的事件数据结构;实验结果表明在采用传统算法回滚量较大或难以实现动态负载平衡的情况下,基于论文提出的机制不仅事件调度开销小,而且回滚率能够降低10%~80%,体现了良好的负载平衡能力。
     (3)提出了多核环境下并行离散事件仿真系统基于事务内存的共享属性访问机制。并行离散事件仿真系统中往往存在大量的状态数据通信,这种大量的通信容易导致仿真系统性能下降。目前的共享属性通信机制大多采用对象代理技术,不能充分发挥多核处理器共享内存地址空间低通信延迟的优势。论文针对此问题,设计了基于事务内存的共享属性访问机制PDES-STM。该机制根据并行离散事件仿真的特点将每个仿真事件对应成一个事务,仿真事件并发执行时对共享状态属性访问的正确性由事务内存系统中的冲突检测与解决机制来保证。实例分析结果显示PDES-STM能够有效减少内存开销和消息数目;在测试平台上的运行结果表明论文提出的PDES-STM相对基于消息实现的“拉”方式访问机制的性能优势随外部属性访问概率的增大而越来越明显。
     (4)提出了多核集群上的并行仿真系统通信延迟隐藏算法。通信延迟是制约分布式计算环境下仿真可扩展性和性能提高的主要因素之一。目前对大规模细粒度模型的仿真应用在并行与分布环境下的通信优化技术尚不能达到足够的通信延迟隐藏,针对此问题,论文提出了“以计算换通信”的优化的B+2R延迟隐藏算法—O(B+2R)算法,该算法依据网络传输时间选取合理的R值,通过对接收到的落伍的实体状态值追加R个时间步的计算来隐藏通信延迟,从而获得更多的并发性。理论分析显示,在合理的配置下该算法可以实现完全的“通信延迟隐藏”。同时,针对CPU和GPU各自的特点,将O(B+2R)算法扩展到GPU平台设计了B+2(R×r)算法。实验结果表明论文提出的通信延迟隐藏机制能够隐藏更多的通信延迟,在合理的运行配置下性能可提高40%以上,可以用于平衡多核集群各种层次上的计算资源之间的计算和通信开销。
     在上述研究成果的基础上,论文在Musik仿真引擎的基础上设计并实现了基于多核集群的层次式并行离散事件仿真框架;并通过突发公共事件条件下的民意趋势模型对其在多核集群上进行了综合测试,结果表明所采用的优化技术使得整体仿真系统性能提高了45%左右,并且显现出良好的可扩展性。
The current trend in processor architecture design is the integration of multiple cores on a single processor. Clusters made of such microprocessors are widely adopted by Parallel Discrete Event Simulation (PDES) developers for large-scale simulation applications. The tightly integrated processing cores in one chip with communication latencies substantially lower than those present in conventional clusters provide potential performance improvement especially for the fine-grained PDES. Thus, in the PDES domain, one of the research focuses is on modifying software platforms to efficiently utilize the computation resources of multi-core processors.
     Considering the characteristics of the Multi-core clusters and Parallel Discrete Event Simulation System, this dissertation investigates solutions to improve the performance of large-scale or complex simulation programs from various factors which may affect the performance of parallel discrete event simulation, including event-scheduling, shared-attribute access and communication optimization. The innovations of this paper are as follows:
     Firstly, a synchronization algorithm which combines multi-thread parallel mode with MPI is proposed. Experimental results show that multi-thread parallel scheme outperform multi-process parallel mode in most cases. But current available simulation engines lack supports to combining multi-thread parallel mode with MPI for distributed compute environment or the relative technologies are not mature. In this paper, the compatibility of multi-thread parallel mode to cluster computing platform is considered and a time management mechanism combining multi-thread parallel mode on each machine with MPI communication for all the machines in cluster. A group of tests have been performed and the results show that this hybrid mechanism runs very well on multi-core cluster.
     Secondly, a global schedule mechanism based on a distributed event queue to improve the performance of Time Warp system on multi-core systems is proposed. The current dynamic load balancing technologies cann’t reach the twin goals of good balance and low event-scheduling overhead. In this paper, taking advantage of multi-core architecture with shared memory address space and low communication, a global schedule mechanism based on a distributed event queue is proposed. Its specially designed data structures and algorithms reduce the cost of lock operations much. Comparing with the distributed event queue local schedule mechanism, the experiment results show that the distributed queue global schedule mechanism can effectively decrease rollback rate and balance the workloads at a low event scheduling cost for Time Warp system on multi-core platforms.
     Thirdly, a shared attribute/state access mechanism based on transactional memory to make users easier to model their system and improve the performance of Time Warp system on multi-core systems is proposed. This mechanism implements transparent access to shared attributes with simple API and provides more powerful modeling ability for agent-based simulation application. A case study is given to demonstrate how to use this mechanism and what merits it brings. Theoretical analysis shows that this access mechanism is able to not only ease the attribute-publishing/subscribing burden on simulation model developers but also reduce the number of messages. The experiment results show that the STM-based shared attribute access mechanism prominently outperforms the conventional“pull”mechanism on multi-core platforms.
     Fourthly, a more effective latency-hiding mechanism in the parallelization of agent-based model simulations (ABMS) with millions of agents is proposed. The current B+2R latency-hiding algorithm only hides part of communication latency. In this paper, a new latency-hiding algorithm is proposed. The principle of this algoritm is that certain redundant computation trade communication. An analytical model for this algorithm is given and theoretical analysis shows that this algorithm can hide all the communication latency when a proper R is selected. In addition, a B+2(R×r) algorithm which combines the new and old B+2R algorithm is designed to make the new B+2R algorithm is effective on GPU platform. The experiment results indicate the benefits of the new B+2R latency-hiding scheme, delivering as much as over 40% improvement in runtime for certain benchmark ABMS application scenarios with several billion agents.
     Finally, much performance optimization work on a simulation application to forecaste the trend of public opinion under critical condition has been done to reduce the memory overhead and get scalability. The experimental results demonstrate that the system scale increases one order on single multi-core machine and good scalability is shown on multi-core cluster.
引文
[1] Asanovic K, Bodik R, Catanzaro B C, Gebis J J, Husbands P, Keutzer K, Patterson D A, Plishker W L, Shalf J, Williams S W, and Yelick K A. The landscape of parallel computing research: A view from Berkeley [R]. Berkeley: Electrical Engineering and Computer Sciences, University of California at Berkeley, 2006: UCB/EECS-2006-183.
    [2] Sutter H. The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software [J]. Dr. Dobb’s Journal, 2005,30(3):16-22.
    [3] Fujimoto R M. Parallel and distributed simulation systems [M]. New York: John Wiley & Sons Inc, 2000.
    [4]张颖星.面向复杂系统的并行离散事件仿真性能优化技术研究[D].长沙:国防科学技术大学,2011.
    [5] Chandy K M, Misra J. Distributed Simulation:A Case Study and Design and Verification of Distributed Programs [J]. IEEE Transactions on Software Engineering, 1979, 5(5): 440-452.
    [6] Bryant R E. Simulation of Packet Communication Architecture Computer Systems[R]. MA: Cambridge, 1977.
    [7] Jefferson D R. Virtual Time [J]. ACM Transactions on Programming Languages and Systems, 1985,7(3):404-425.
    [8] Xiao Z, Gomes F, Unger B. A Fast Asynchronous GVT Algorithm for Shared Memory Multiprocessor Architectures [C]// Proceedings of the 9th Workshop on Parallel and Distributed Simulation (PADS 1995). Washington, DC: IEEE Computer Society, 1995:203-208.
    [9] Varghese G, Chamberlain R, Weihl W. Deriving Global Virtual Time Algorithms from Conservative Simulation Protocols [J]. Information Processing Letters, 1995, 54(2):121-126
    [10] Fujimoto R, Hybinette M. Computing Global Virtual Time in Shared-Memory Multiprocessors [J]. ACM Transactions on Modeling and Computer Simulation, 1997, 7(4): 425-446.
    [11] Steinman J, Lee C, Wilson L, et al. Global Virtual Time and Distributed synchronization [C]// Proceedings of the 9th Workshop on Parallel and Distributed Simulation (PADS 1995). Washington, DC: IEEE Computer Society, 1995:139-148.
    [12] Perumalla K. Parallel and Distributed Simulation: Traditional Techniques and Recent Advances [C]// Proceedings of the the 38th conference on winter simulation (WSC’06). Monterey, CA: Winter Simulation Conference, 2006:84-95.
    [13] Das S, Fujimoto R M, Panesar K, et al. GTW: a time warp system for shared memory multiprocessors [C]// Proceedings of the 26th Conference on Winter Simulation (WSC94). San Diego, CA: The Society for Modeling and Simulation International (SCS), 1994:1332-1339.
    [14] Steinman J. SPEEDES: Synchronous Parallel Environment for Emulation and Discrete Event Simulation [C]// Proceedings of the 1991 SCS Multiconference on Advances in Parallel and Distributed Simulation. San Diego, CA: SCS, 1991:95-103.
    [15] Carothers C, Bauer D, Pearce S. ROSS: A High-Performance, Low Memory, Modular Time Warp System [C]// Proceedings of the 14th Workshop on Parallel and Distributed Simulation (PADS 2000). Washington, DC: IEEE Computer Society, 2000:53-60.
    [16] Perumalla K S.μsik-A Micro-Kernel for Parallel/Distributed Simulation Systems [C]// Proceedings of the 19th Workshop on Parallel and Distributed Simulation (PADS 2005). Washington, DC: IEEE Computer Society, 2005: 59-68
    [17] Martin D E, McBrayer T J, and Wilsey P A. WARPED: A Time Warp Simulation Kernel for Analysis and Application Development [C]// Proceedings of the 29th Hawaii International Conference on System Sciences. Washington, DC: IEEE Computer Society, 1996: 383-386.
    [18] Steinman J S. The WarpIV Simulation Kernel [C]// Proceedings of the 2005 Principles of Advanced and Distributed Simulation (PADS) workshop. Washington, DC: IEEE Computer Society, 2005:161-170.
    [19] Bagrodia R, Liao W T. Maisie: A Language for the Design of Efficient Discrete-Event Simulations [J]. IEEE Transaction on Software Engineering, 1994, 20: 225-238.
    [20] Bagrodia R, Meyer R, Takai M, et al. Parsec: a Paralle Simulation Environment for Complex Systems [J]. IEEE Computer, 1998, 31(10): 77-85.
    [21] Gomes F, Franks S, Unger B, et al. SimKit: A High Performance Logical Process Simulation Class Library in C++ [C]// Proceedings of the 27th Conference on Winter Simulation. Washington, DC: IEEE Computer Society, 1995:706-713.
    [22] Jefferson D R, Beckman B, Wieland F, et al. Distributed simulation and the Time Warp Operating System [C]// Proceedings of the 11th ACM Symposium on Operating System Principles. New York: ACM, 1987:77-93.
    [23] Wilmarth T. POSE: Scalable General-Purpose Parallel Discrete Event Simulation [D]. Urbana: University of Illinois at Urbana-Champaign, 2005.
    [24] Miller R J. Optimistic Parallel Discrete Event Simulation on a Beowulf Cluster of Multi-core Machines [D]. Cincinnati: University of Cincinnati, 2010.
    [25] Sreeram J, Cledat R, Kumar T, and Pande S. RSTM: A Relaxed ConsistencySoftware Transactional Memory for Multicores [C]// Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques. Washington, DC, USA: IEEE Computer Society, 2007:428.
    [26] Herlihy M and Moss J E B. TransactionaI Memory:Architectural Support for Lock-free Data Structures [C]// Proceedings of the 20th annual international symposium on Computer architecture. New York: ACM, 1993:289-300.
    [27] Shavit N and Touitou D. Software TransactionaI Memory [C]//Proceedings of the 14th ACM Symposium on PrincipIes of Distributed Computing. New York: ACM, 1995:204-213.
    [28] Minh C C, Trautmann M, Chung J, McDonald A, Bronson N, Casper J, Kozyrakis C, Olukotun K. An Effective Hybrid Transactional Memory System with Strong Isolation Guarantees [C]// Proceedings of the 34th annual international symposium on Computer architecture. New York: ACM, 2007:69-80.
    [29] El-Khatib, Tropper C. On metrics for the dynamic load balancing of optimistic simulations [C] // Proceedings of the 32th Hawaii International Conference on System Sciences. Hawaii: IEEE Computer Society, 1999: 8051.
    [30] Harris T, and Fraser K. Language Support for Lightweight Transactions [C]// Proceedings of 18th Annual ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications. 2003: 388–402.
    [31] Herlihy M, Luchangco V, Moir M, Scherer W N. Software Transactional Memory for Dynamic-sized Data Structures [C]// Proceedings of the twenty-second annual symposium on Principles of distributed computing. New York, NY, USA: ACM, 2003:92– 101.
    [32] Adl-Tabatabai A R, Lewis B T, Menon V, Murphy B R, Saha B, and Shpeisman T. Compiler and runtime support for efficient software transactional memory [C]// Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation. New York: ACM, 2006: 26–37.
    [33] Harris T, Plesko M, Shinnar A, and Tarditi D. Optimizing memory transactions [C]// Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation. New York: ACM, 2006: 14–25.
    [34] Saha B, Adl-Tabatabai A R, Hudson R L, Minh C C, and Hertzberg B. McRT-STM: A high performance software transactional memory system for a multi-core runtime [C]// Proceedings of the 11th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. New York: ACM, 2006:187–197.
    [35]刘奥.基于HPC的并行仿真支撑环境原型系统研究与实现[D].长沙:国防科学技术大学, 2005.
    [36]姚益平,张颖星.基于并行处理的分析仿真解决方案[J].系统仿真学报,2008,20(24):6617-6621.
    [37]乔海泉,并行仿真引擎及相关技术研究[D].长沙:国防科学技术大学, 2006.
    [38] Perumalla K S, and Aaby B G. Data Parallel Execution Challenges and Runtime Performance of Agent Simulations on GPUs [C]// Proceedings of the 2008 Spring simulation multi-conference. San Diego, CA, USA: Society for Computer Simulation International, 2008:116-123.
    [39] Lysenko M, and D'Souza R M. A Framework for Megascale Agent Based Model Simulations on Graphics Processing Units [J]. Guildford: Journal of Artificial Societies and Social Simulation. 2008:vol. 11.
    [40] Hyungwook P, and Fishwick P A. A fast hybrid time-synchronous/event approach to parallel discrete event simulation of queuing networks [C]// Proccedings of the 40th Conference on Winter Simulation. Monterey, CA: Winter Simulation Conference, 2008: 795-803.
    [41] Steinman J. A Unified Technical Framework for Net-centric Systems of Systems, Test and Evaluation, Training, Modeling and Simulation, and Beyond...[C]//Proceedings of the Fall 2008 Simulation Interoperability Workshop, 2008: 08F-SIW-041.
    [42] Guillon A, and Loach D. YetiSim: A C++ Simulation Library with Execution Graphs Instead of Coroutines [C]//Proceedings of the 2008 Spring simulation multi-conference. San Diego, CA, USA: Society for Computer Simulation International, 2008:No. 24.
    [43] http://repast.sourceforge.net[EB/OL].
    [44] http://cs.gmu.edu/~eclab/projects/mason/[EB/OL].
    [45] Aaby B G, Perumalla K S, and Seal S K. Efficient Simulation of Agent-Based Models on Multi-GPU and Multi-Core Clusters [C]// Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniques. Brussels, Belgium: ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering). 2010:No. 29.
    [46] Bahulka K, Hofmann N, Jagtap D, et al. Performance Evaluation of PDES on Multi-Core Clusters [C]// Proceedings of the 2010 IEEE/ACM 14th International Symposium on Distributed Simulation and Real Time Applications. Washington, DC, USA: IEEE Computer Society. 2010:131-140.
    [47] Donald J, Martonosi M. An Efficient, Practical Parallelization Methodology for Multicore Architecture Simulation [J]. IEEE Computer Architecture Letters, 2006,5(2):14-17.
    [48] Wang K, Zhang Y, Wang H. Parallelization of IBM Mambo System Simulator in Functional Models [J]. ACM SIGOPS Operating System Review,2008,42(1):71-76.
    [49] Chen J, Annavaram M, and Dubois M. SlackSim: A Platform for Parallel Simulations of CMPs on CMPs [J]. ACM SIGARCH Computer Architecture News, 2009,37(2):20-29.
    [50]苏年乐.仿真模型可移植性规范的多核并行化研究[D].长沙:国防科学技术大学, 2010.
    [51]周伟明.多核计算与程序设计[M].武汉:华中科技大学出版社, 2009.
    [52]张舒,褚艳利. GPU高性能运算之CUDA [M].北京:中国水利水电出版社,2009.
    [53]车永刚.科学计算程序性能分析与优化关键技术研究[D].长沙:国防科学技术大学, 2004.
    [54] Fujimoto R M. Performanc of Time Warp under synthetic workloads [C] // Proceedings of 1990 SCS Multiconference on Distributed Simulation. San Diego, CA: SCS, 1990:23-28.
    [55] Jones D W. An empirical comparison of priority-queue and event-set implementations [J]. Commun. ACM, 1986, 29(4): 300-311.
    [56] Wang B, Himmelspach J, Ewald R, et al. Experimental analysis of logical process simulation algorithms in JamesII [C] // Proceedings of the 41th Winter Simulation Conference (WSC2009). Monterey, CA: Winter Simulation Conference, 2009: 1167-1179.
    [57] Carothers C, and Perumalla K S. On Deciding Between Conservative and Optimistic Approaches on Massively Parallel Platforms [C]// Proceeding of the 2010 Winter Simulation Conference. Monterey, CA: Winter Simulation Conference, 2010:678-687.
    [58] Yocum K, Eade E, Degesys J. Toward scaling network emulation using topology partitioning [C] // Proceedings of the11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer Telecommunications Systems. Washington, DC: IEEE Computer Society, 2003: 242-245
    [59] Liu Xin, Chien A. Traffic-based load balance for scalable network emulation [C] // Proceedings of the 2003 ACM/IEEE conference on supercomputing. Washington, DC: IEEE Computer Society, 2003: 40
    [60]王晓锋,方滨兴,云晓春,张宏莉.并行网络模拟中的一种拓扑划分方法[J].通信学报,2006,27(2):16-21
    [61] Reiher P L, Jefferson D. Virtual time based dynamic load management in the time warp operating system [C] // Proceedings of the 1990 SCS Multiconference on Distributed Simulation. San Diego, CA: SCS, 1990: 103–111.
    [62] Glazer D W, Tropper C. On process migration and load balancing in time warp[J]. IEEE Transactions on Parallel and Distributed Systems, 1993, 4(3):318–327.
    [63] Low M Y H. Dynamic load-balancing for BSP time warp [C] // Proceedings of the 35th Annual Simulation Symposium (SS’02). San Diego, CA: IEEE Computer Society, 2002: 267–274.
    [64] Peschlow P, Honecker T, Martini P. A flexible dynamic partitioning algorithm for optimistic distributed simulation [C] // Proceedings of the 21st International Workshop on Principles of Advanced and Distributed Simulation (PADS 2007) , San Diego, CA: IEEE Computer Society, 2007: 219-228.
    [65] Shen W. Load Migration policies in distributed simulation using SPEEDES [D], MS Thesis. New Hampshire: Dartmouth College, Hanover, 1998.
    [66] Meraji S, Zhang W, and Tropper C. A Multi-State Q-learning Approach for the Dynamic Load Balancing of Time Warp [C]// Proceedings of the 24th Workshop on Parallel and Distributed Simulation. Washington, DC, USA: IEEE Computer Society. 2010:1-8.
    [67] Bauer D W, Carothers C D, and Holder A. Scalable Time Warp on Blue Gene Supercomputers [C]//Proceedings of the 2009 ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation. Washington, DC, USA: IEEE, 2009:35–44.
    [68]张耀程.通用并行离散事件仿真环境及相关技术研究[D].长沙:国防科学技术大学,2008.
    [69]张耀程,李革,黄柯棣.并行离散事件仿真中的DDM机制实现[J].国防科技大学学报,2008,30(2):118-122.
    [70] Morse K, Bic L, Dillencourt M. Interest Management in Large-Scale Virtual Environment [J]. Teleoperators and virtual Environment, 2000, 9(1):52-68.
    [71]史杨.新一代仿真框架HLA/RTI中数据过滤技术的研究及实现[D].长沙:国防科学技术大学,1999.
    [72]黄红兵.复杂系统分布仿真平台中时空非耦合兴趣管理的研究与实现[D].长沙:国防科学技术大学, 2005.
    [73]黄红兵,叶超群,金士尧.分布仿真中兴趣管理的关键技术[C] // 2005系统仿真技术及其应用学术会议.广州:系统仿真学会, 2005:271-274.
    [74] Morse K. An adaptive, Distributed Algorithm for Interest Management [D]. California: University of California, 2000.
    [75] Hook D J, Calvin J O. Data Distribution Management in RTI1.3 [C] // Proceedings of the 1998 Spring Simulation Interoperablility Workshop. San Diego, CA: SCS, 1998: 98S-SIW-206.
    [76] Hyett M, Wuerfel R. Implementation of the Data Distribution Management Services in RTING [C] // Proceedings of the 2002 Spring SimulationInteroperability Workshop. San Diego, CA: SCS, 2002: 02S-SIW-044.
    [77] Ayani R, Moradi F, Tan G. Optimizing Cell-size in Grid-Based DDM [C] // Proceedings of the 14th Workshop on Parallel and Distributed Simulation (PADS 2000). Washington, DC: IEEE Computer Society, 2000:93-100.
    [78] Rak S J, Hook D J. Evaluation of Grid-Based Relevance Filtering for Multicast Group Assignment [C] // Proceedings of the 14th DIS Workshop on Standards for the Interoperability of Distributed Simulation. Orlando, FL: Institute for Simulation & Training, 1996:739-747.
    [79] Raczy C, Tan G, Yu J. A Sort-Based DDM Matching Algorithm for HLA [J]. ACM Transactions on Modeling and Computer Simulation, 2005, 15(1):14-38.
    [80] Seidensticker. HLA Data Filtering and Distribution Requirements [C] // Proceedings of the 14th DIS Workshop on Standards for the Interoperablility of Distributed Simulation. Orlando, FL: Institute for Simulation & Training, 1996: 861-864.
    [81]孟凡亮,胡晓峰,蒋亚群,禹海全,徐旭林.基于并行计算的大规模群体行为建模与仿真方法研究[J].计算机应用, 2010.06:1679-1681.
    [82] Felber P, Fetzer C, and Riegel T. Dynamic Performance Tuning of Word-Based Software Transactional Memory [C]// Proceedings of the 2010 IEEE Workshop on Principles of Advanced and Distributed Simulation. Washington, DC: IEEE Computer Society, 2008: 1-8.
    [83] Dice D, Shalev O, et al. Transactional Locking II [C]// Proceedings of the 20th International Symposium on Distributed Computing. 2006.
    [84] Saha B, Adl-Tabatabai A-R, et al. McRT-STM: A High Performance Software Transactional Memory System for a Multi-core Runtime [C]// Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming. New York, NY, USA: ACM, 2006:187-197.
    [85] Brito A, Fetzer C, Sturzrehm H. Speculative Out-Of-Order Event Processing with Software Transaction Memory [C]// Proceedings of the second international conference on Distributed event-based systems. New York, NY, USA: ACM, 2008:265-275.
    [86]胡晓峰,罗批,司光亚,张国春,等.战争复杂系统建模与仿真[M].北京:国防大学出版社,2005.
    [87] Nicol D, Liu J. Composite Synchronization in Parallel Discrete-Event Simulation [J]. IEEE Transactions on Parallel and Distributed System, 2002.13(5):433-446.
    [88] Palaniswamy A, Wiley P. An Analytical Comparison of Periodic Checkpointing and Incremental State Saving [C]. Proceedings of the 7th Workshop on Parallel and Distributed Simulation (PADS 1993). New York: ACM, 1993:127-134.
    [89] Bruce D. The Treatment of State in Optimistic Systems [J]. ACM SIGSIMSimulation Digest, 1995, 25(1):40-49.
    [90] Cleary J, Gomes F, Unger B, et al. Cost of State Saving and Rollback [C] // Proceedings of the 8th Workshop on Parallel and Distributed Simulation (PADS 1994). Edinburgh, Scotland UK: ACM, 1994:94-101.
    [91] Franks S, Gomes F, Unger B, et al. State Saving for Interactive Optimistic Simulation [C] // Proceedings of the 11th Workshop on Parallel and Distributed Simulation (PADS 1997). Washington, DC: IEEE Computer Society, 1997:72-79.
    [92] West D, and Panesar K. Automatic Incremental State Saving [C]// Proceedings of the 10th Workshop on Parallel and Distributed Simulation. Washington, DC: IEEE Computer Society, 1996:78-85.
    [93] Carothers C, Perumalla K S, and Fujimoto R M. Efficient Optimistic Parallel Simulations using Reverse Computation [C]// Proceedings of 13th Workshop on Parallel and Distributed Simulation. Washington, DC: IEEE Computer Society, 1999: 126-135.
    [94] Tang Y, Perumalla K, Fujimoto R M. Optimistic Parallel Discrete Event Simulations of Physical Systems using Reverse Computation [C] // Proceedings of the 19th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2005). Washington, DC: IEEE Computer Society, 2005: 26-35.
    [95] Hybinette M, Kraemer E, Xiong Y, Glenn M, and Ahmed J. SASSY: A Design for a Scalable Agent-based Simulation System Using a Distributed Discrete Event Iinfrastructure [C]// Proceedings of the 2006 Winter Simulation Conference. Monterey, CA: Winter Simulation Conference, 2006:926-933.
    [96] Perumalla K, Seal S K. Reversible Parallel Discrete-Event Execution of Large-scale Epidemic Outbreak Models [C] // Proceedings of 24th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010). Washington, DC: IEEE Computer Society, 2010:80-90.
    [97] Samadi B. Distributed simulation, algorithms and performance analysis [D]. California: University of California, Los Angeles, 1985.
    [98] Mattern F. Efficient algorithms for distributed snapshots and global virtual time approximation [J]. Journal of Parallel and Distributed Computing, 1993, 18(4): 423--434.
    [99]曲庆军,姚益平,刘步权.基于Starlink的民意模型的改造和测试[J].计算机应用, 2005,25(10):2318~2319.
    [100] Karypis G, and Kumar V. Hmetis: a Hypergraph Partitioning Package. http://www.cs.umn.edu/ karypis/metis/hmetis [EB/OL].
    [101] http://www.media.mit.edu/starlogo/ [EB/OL].
    [102] Germann T C, Kadau K, Longini I M, and Macken A A. Mitigation strategies for pandemic influenza in the United States [J]. Proceedings of the National Academy of Sciences of the United States of America. 2006, vol. 103, no. 15: 5935–5940.
    [103] Barrett C L, Bisset K R, Eubank S G, Feng X, and Marathe M V. EpiSimdemics: an Efficient Algorithm for Simulating the Spread of Infectious Disease over Large Realistic Social Networks [C]// Proceeding Of the 2008 ACM/IEEE conference on Supercomputing. Piscataway, NJ, USA: IEEE Press, 2008:1–12.
    [104] Bisset K R, Chen J, Feng X, Kumar V S A, and Marathe M V. EpiFast: A Fast Algorithm for Large Scale Realistic Epidemic Simulations on Distributed Memory Systems [C]// Proceeding of the 23rd international conference on Supercomputing. NewYork, NY, USA: ACM, 2009:430–439.
    [105] D'Souza R M, Lysenko M, and Rahmani K. SugarScape on Steroids: Simulating Over a Million Agents at Interactive Rates [C]// Proceedings of AGENT 2007 Conference on Com Complex Interaction and Social Emergence. Evanston, IL, 2007.
    [106] http://www.intel.com/technology/quadcore/index.htm [EB/OL].
    [107] Product Brief: Quad-Core AMD Opteron Processor. http://www.amd.com/usen/Processors/ProductInformation/03011887961522300.html [EB/OL].
    [108] UltraSPARC Processors. http://www.sun.com/processors/ [EB/OL].
    [109] The Cell project at IBM Research. http://www.research.ibm.com/cell/ [EB/OL].
    [110] Top 500 SuperComputer Sites. http://www.top500.org/ [EB/OL].
    [111] Koop M, Huang W, Vishnu A, and Panda D K. Memory Scalability Evaluation of the Next-Generation Intel Bensley Platform with InfiniBand [C]// Proceedings of Hot Interconnect. Washington, DC: IEEE Computer Society, 2006:52-60.
    [112] Alam S R, et al. Characterization of Scientific Workloads on Systems with Multi-Core Processors [C]// In International Symposium on Workload Characterization, 2006.
    [113] Domeika M and Kane L. Optimization Techniques for Intel Multi-Core Processors. http://www3.intel.com/cd/ids/developer/asmona/eng/261221.htm?page=1 [EB/OL].
    [114] Ganesh K. Optimization Techniques for Optimizing Application Performance on Multi-Core Processors. http://tree.celinuxforum.org/CelfPubWiki/ELC2006Presentations?action=AttachFile&do=get&target=Ganesh-CELF.pdf [EB/OL].
    [115] Tian T and Shih C P. Software Techniques for Shared-Cache Multi-Core Systems.http://www.intel.com/cd/ids/developer/asmona/eng/recent/286311.htm?page=1 [EB/OL].
    [116] Nance R E, and Sargent R G. Perspectives on the evolution of simulation [J]. Oper. Res., vol. 50, no. 1, 2002:161–172.