多核系统下并行节点复制垃圾收集算法研究

英文题名：Research on Algorithm of Parallel Copying Garbage Collection Based on Lisp 2 for Multicore System
作者：吴长茂
论文级别：硕士
学科专业名称：计算机软件与理论
中文关键词：多核 ; 垃圾收集 ; Lisp ; 2算法 ; 节点复制算法 ; 并行
英文关键词：multicore ; garbage collection ; Lisp 2 ; copying algorithm ; parallelization
学位年度：2011
导师：张聪品
学科代码：081202
学位授予单位：河南师范大学
论文提交日期：2011-05-01

摘要

随着面向对象语言和程序设计方法的广泛使用,垃圾收集日益受到重视。垃圾收集,即自动内存管理,是指运行时系统负责自动回收无用对象的一种―废料‖回收机制。当代流行语言如Java、C#语言等都具有垃圾收集功能。垃圾收集机制的出现,使程序员只需专注于对象的分配,而无需考虑对象何时被销毁,垃圾对象的回收由垃圾收集器在后台动态地完成而毋须程序员干预。垃圾收集避免了内存泄漏和不正确的内存操作(如悬挂引用)引起的软件错误,提高了软件健壮性。然而,垃圾收集器回收垃圾对象时造成用户程序反应迟缓,一定程度上影响了用户程序效率和用户体验。因此,如何缩短垃圾回收时间,提高垃圾收集器效率,对于提高用户程序执行效率和改善用户体验具有重要的应用价值。
     近年来,多核CPU和多核GPU日益普及,以及它们并行计算性能的提高,为垃圾收集并行化提供了坚实的硬件基础。
     在多核系统下,本文基于Lisp 2算法提出了一种新颖的节点复制算法以及该算法的并行化算法,并分别给出了该并行化算法在多核CPU和多核GPU下的一个实现。围绕多核系统下并行节点复制垃圾收集算法研究,本文重点完成了以下工作:
     (1)深入研究了常用垃圾收集算法。通过查阅大量的垃圾收集文献,对常用垃圾收集算法如引用计数算法、标记-清扫算法、标记-压缩算法、节点复制算法、分代式算法特别是并行垃圾收集算法进行了深入研究,比较了这些算法的优缺点及适用环境。
     (2)基于Lisp 2算法,提出了一种新颖的节点复制垃圾收集算法-- Lisp 2-Copying-GC-Algorithm。基本思想是把Lisp 2垃圾收集算法在整个内存堆上收集改为在节点复制算法的一个半区(称为FromSpace半区)上进行,把存活对象复制到另一个半区(称为ToSpace半区)。
     (3)提出了Lisp 2-Copying-GC-Algorithm算法的并行化算法--并行节点复制垃圾收集算法(Lisp 2-Parallel-Copying-GC)。基本思想是把内存堆的两个半区(FromSpace半区和ToSpace半区)划分为大小相同的块(block),为块定义了6个不同状态,这些状态标识了在垃圾收集过程中块(block)的不同地位和作用。这样,块为并行垃圾收集线程提供了一个收集尺度,而块的状态为并行垃圾收集线程间的同步提供了保障。
     (4)给出了并行节点复制垃圾收集算法(Lisp 2-Parallel-Copying-GC)在多核CPU环境下基于OpenMP3.0规范的一个实现。OpenMP是一种面向共享内存以及分布式共享内存的多处理器(多核)多线程并行编程语言,是一种多线程、共享内存的并行应用程序编程接口。实验数据显示,在4个逻辑核心下该算法加速比达到了2.53,提高了垃圾收集效率。
     (5)给出了并行节点复制垃圾收集算法(Lisp 2-Parallel-Copying-GC)在多核GPU环境下基于NVIDIA CUDA(Compute Unified Device Architecture)架构的一个实现。CUDA是NVIDIA并行计算架构,通过利用GPU使计算能力大幅提高。本文首次在垃圾收集领域利用CUDA加速进行了有益尝试。实验结果表明,得益于多核GPU数量众多的处理核心和轻量级线程设计,多核GPU相对多核CPU的加速比达到了278.13。
     多核环境下多线程并行垃圾收集提高了垃圾收集效率,改善了用户体验。
As object-oriented languages and programming approach are widely used at present, garbage collection is drawing people’s more attention. Garbage collection which is also called Dynamic Memory Management is a mechanism of deallocating unreachable objects by Runtime System. The prevailing object-oriented languages such as C# and Java all support garbage collection. With garbage collection, programmers only pay attention to allocate memory for objects without taking destroying objects into consideration. The unreachable objects are deallocated automatically by garbage collector without the intervention of programmers, which considerably decreases the burden of programmers. Garbage collection avoids memory leak and errors caused by incorrect memory operations, e.g. dangling reference, improves the robustness of software as well. The performance and efficiency of garbage collector, however, directly affect the program’s executive efficiency and user’s experience. Therefore, improving the efficiency of garbage collector is of great importance in application fields.
     At the present time, multicore CPU and multicore GPU increasingly prevail, consequently their parallel computing ability sharply rise, all these offer the parallelization of garbage collection a solid basis
     With the study on garbage collection, we have mainly accomplished the following works in this paper:
     1. We have deeply researched on the algorithms of garbage collection
     By means of looking up a large number of papers and treatises about garbage collection, we do many researches on the garbage collection algorithms e.g. Reference-Count algorithm, Mark-Sweep Algorithm, Copying Algorithm as well Generational Algorithm, especially parallel garbage collection algorithm, and compare these methods in the aspects of disadvantages, advantages and applicable surroundings.
     2. We have presented a novel copying garbage collection algorithm (Lisp 2-Copying-GC-Algorithm) based on Lisp 2 through deallocating unreachable objects on FromSpace and copying live objects to ToSpace.
     3. By means of dividing heap into equal blocks and defining 6 statuses for blocks, we have presented the parallel algorithm of Lisp 2-Copying-GC-Algorithm, called Lisp 2-Parallel-Copying-GC.
     4. The implementation of Lisp 2-Parallel-Copying-GC in multicore CPU system
     In multicore CPU system, we have implemented the algorithm of Lisp 2-Parallel-Copying-GC using the OpenMP 3.0 specification. OpenMP is a parallel language for shared memory system and distributed shared memory system, and also is a programming interface for multithread programming. The experimental results show that in multicore CPU system the proposed algorithm is able to improve the garbage collection efficiency in a large scale.
     5. The implementation of Lisp 2-Parallel-Copying-GC in multicore GPU system
     CUDA is NVIDIA’s parallel computing architecture. It enables dramatic increases in computing performance by harnessing the power of the GPU. For the first time we conduct an effective exploration in garbage collection field using GPU system. In multicore GPU system, we have implemented Lisp 2-Parallel-Copying-GC. From the results we can see that the garbage collection efficiency is much improved in multicore GPU system than that in multicore CPU system.
     From the works above, we realize that with the multicore framework (multicore CPU system and multicore GPU system) prevailing, if garbage collection could employ the strong computing ability in multicore system, the efficiency of garbage collector will be largely improved so that the user’s experience will be better.

引文

[1] H.Gelernter, J.R.Hansen, C.L.Gerberich. A Fortran-compiled list processing language [J].Journal of the ACM, 1960 April, Volume 7 Issue 2: 87-101.
    [2] Daniel R. Edelson. A mark-and-sweep collector C++ [J].Proceedings of the 19th ACM SIGPLAN-SIGACT symposium on Principles of programming languages ,1992,February,51-58.
    [3]谢之易.一种新的适用于面向对象程序设计语言的保守式垃圾收集机制[J].计算机应用与软件,2008,25(1):96-99,123.
    [4] Apache Harmony is the Java SE project of the Apache Software Foundation [EB/OL]. http://harmony.apache.org.
    [5] Dynamic Runtime Layer Virtual Machine Developer's Guide[EB/OL]. http:// harmony.apache.org/subcomponents/drlvm/developers_guide.html#VM_Architecture.
    [6] CUDA - CUDA GPU[EB/OL]. http://www.nvidia.cn/object/cuda_gpus_cn.html.
    [7] Richard Jones, Rafael Lins著,谢之易译.垃圾收集[M].北京:人民邮电出版社,2004.
    [8] George E. Collins. A method for overlapping and erasure of lists [J]. Communications of the ACM, 1960 December, Volume 3 Issue 12: 655-657.
    [9] John McCarthy. Recursive functions symbolic expressions and their computation by machine [J]. Communications of the ACM, 1960, Volume 3: 184-195.
    [10] Marvin L.Minsky. A lisp garbage collector using special secondary storage[J].Technical Report Memo 58(rev.), Project MAC, MIT, Cambridge,MA,1963 December.
    [11]裘宗燕.Garbage Collection---问题和技术[J].程序员,2003,1:53-58.
    [12]赖少庆.介绍废料收集的一些新技术[J].计算机应用,1983,39-12.
    [13] CommonLanguageRuntimeOverview[EB/OL].http://msdn.microsoft.com/en-us/library /ddk909ch(v=VS.90).aspx.
    [14] Narendran Sachindran,J. Eliot B. Moss.Mark-Copy: Fast copying GC with less space overhead[C].Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications(OOPSLA '03), 2003,November,326-343.
    [15] Christine H. Flood, David Detlefs, Nir Shavit, et al. Parallel garbage collection for shared memory multiprocessors[J]. Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1:21-21.
    [16] Xiao-Feng Li, Ligang Wang, and Chen Yang. A Fully Parallel LISP2 Compactor with preservation of the Sliding Properties[C]. Languages and Compilers for Parallel Computing 2008:264-278.
    [17] Apache Harmony - Generational Garbage Collector v5[EB/OL].
    [18] Toshio Endo; Kenjiro Taura; and Akinori Yonezawa. A scalable mark-sweep garbage collector on large-scale shared-memory machines[C].Proceedings of High Performance Networking and Computing (SC97),November 1997.
    [19] Ming Wu and Xiao-Feng Li. Task-pushing: a Scalable Parallel GC Marking Algorithm without Synchronization Operations[C]. Parallel and Distribution Processing Symposium (IPDPS) 2007, Long Beach, CA, March 2007:1-10.
    [20] Robert H. Halstead. Implementation of Multilisp: Lisp on a multiprocessor.ACM Symposium on LISP and Functional Programming, 1984:9-17.
    [21] Diab Abuaiadh, Yoav Ossia, Erez Petrank, et al. An efficient parallel heap compaction algorithm[C]. In the ACM Conference on Object-Oriented Systems, Languages and Applications, 2004:224-236.
    [22] Intel(?) Hyper-Threading Technology (Intel(?) HT Technology) [EB/OL]. http://www.intel.com/technology/platform-technology/hyper-threading/index.htm.
    [23] Herb Sutter, James Larus. Software and The Concurrency Revolution [J].Queue, Volume 3, Issue 7, 2005, 9, 56-62.
    [24] OpenMP Specifications[EB/OL]. http://www.openmp.org/mp-documents -SummarySpec.pdf.
    [25] OpenMP in VisualC++[EB/OL]. http://msdn.microsoft.com/ en-us/library/ tt15eb9t(v=VS.90).aspx.
    [26] OpenMP Compilers[EB/OL].http://openmp.org/wp/openmp-compilers/.
    [27] Intel? Compilers and Libraries-Intel Software Network[EB/OL].http://www.intel.com/cd/software/products/asmo-na/eng/compilers/284132.htm.
    [28] D r. Fridtjof Siebert. Limits of Parallel Marking Garbage Collection. ISMM’08[C], Tucson, Arizona, USA, June 7–8, 2008:21-29.
    [29] Congpin Zhang, Changmao Wu, and Lili Zhao. Research on Algorithm of Parallel Garbage Collection Based on LISP 2 for Multi-core System[C]. ICIC 2010, CCIS 93, 2010:469–476.
    [30]张聪品,吴长茂,赵理莉.多核系统下并行节点复制垃圾收集算法[J].计算机应用,2010,30(11):2876-2879.
    [31] Apache Harmony - Apache Harmony DRL Virtual Machine[EB/OL].http://harmony.apache.org/subcomponents/drlvm/index.html.
    [32]周伟明.多核计算与程序设计[M].武汉:华中科技大学出版社出版, 2009.
    [33]艾克萨威尔,依恩加尔著,张云泉,陈英译.并行算法导论[M].北京:机械工业出版社,2004.
    [34]张宁,张春宇,杨霞等.垃圾收集器实时化的研究[J].微电子学与计算机[J],2010,27(1):34-37.
    [35]卢凯,周旭,迟万庆.FLSP:一个高效的系统级垃圾收集算法[J].计算机工程与科学,2010,32(11):119-123.
    [36]周寻.并发垃圾收集器及其调度方法的研究[J].计算机应用与软件,2010,27(9):4-6.
    [37]李伟明,李之棠.一种基于神经网络的垃圾收集调度方法[J].小型微型计算机系统,2007,28(7):1173-1176.
    [38]林春晓.基于携带证明的代码的垃圾收集过程验证[D].合肥,中国科学技术大学,2008.
    [39]赵立成.Java虚拟机的内存管理策略的研究[D].成都:电子科技大学,2007.
    [40]王皓.一种内存泄漏检测技术的研究和实现[D].北京:北京交通大学,2008.
    [41]吴廷鹏.垃圾收集器中大对象管理及显式内存管理的研究[D].合肥:中国科学技术大学,2008.
    [42]廖雄飞.微软公共语言支撑技术分析与研究[D].广州:华南理工大学,2003.
    [43]谌宁,覃征.基于嵌入式Java虚拟机的垃圾回收算法[J].计算机应用,2005,25(1):218-219,223.
    [44]龚舒群.JAVA虚拟机中的内存管理技术[J].微型机与应用,1999,10:36-38.
    [45]肖德宝,李伟,彭菲等.改进的自适应分代式垃圾收集[J].华中师范大学学报(自然科学版),2005,39(4):461-465.
    [46] Shaoshan Liu,Ligang Wang,Xiao-Feng Li et al.Space-and-Time Efficient Garbage Collectors for Parallel Systems[J]. International Journal of Parallel Programming, 2010, 1-22.
    [47] Simon Marlow,Tim Harris,Roshan P. James .Generational-Copying Garbage Collection with a Block-Structured Heap[C]. Proceedings of the 2008 International Symposium on Memory Management,2008,11-20.
    [48] Hezi Azatchi, Yossi Levanoni, Harel PazAn et al. On-the-Fly Mark and Sweep Garbage CollectorBased on Sliding Views[C]. Proceedings of the International Symposium on Memory Management, 2000 , 155-166.
    [49] Muthukumar,Janakiram. Yama: A scalable generational garbage collector for Java in multiprocessor systems[[C]. IEEE Transactions on Parallel and Distributed Systems, 2006, February, 148-159.
    [50] Filip Pizlo, Erez Petrank,Bjarne Steensgaard.A Study of Concurrent Real-Time Garbage Collectors[C].Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation, 2008, June, 33-43.
    [51] Chris Hawblitzel,Erez Petrank.Automated Verification of Practical Garbage Collectors[C].Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages,2009,January, 441-453.
    [52]张聪品,吴长茂,赵理莉. CUDA平台下LISP2算法垃圾收集并行研究[J].计算机工程与应用,2010,46(33):75-77,138.
    [53]张舒等. GPU高性能运算之CUDA[M].北京:中国水利水电出版社,2009.
    [54] What is CUDA [EB/OL]. http://www.nvidia.com/object/what_is_cuda_new.html.
    [55]吴长茂,张聪品,张慧云等. CUDA平台下多核GPU高性能并行编程研究[J].河南机电高等专科学校学报,2011,19(1):19-21,29.
    [56] CUDA Toolkit 3.1 (June2010)[EB/OL].http://developer.nvidia.com/ object/ cuda_3_1_downloads.html.
    [57] GPU Computing SDK code sample[EB/OL].http://developer.download.nvidia.com/compute/cuda/3_1/sdk/gpucomputingsdk_3.1_win_32.exe.
    [58] CUDA Developer Guide for Optimus Platforms[EB/OL]. http://developer.download.nvidia.com/compute/cuda/3_1/toolkit/docs/CUDA_Developer_Guide_for_Optimus_Platforms.pdf.
    [59] NVIDIA_CUDA_C_ProgrammingGuide_3.1[EB/OL].http://developer.download.nvidia.com compute/cuda/3_1/toolkit/docs/NVIDIA_CUDA_C_ProgrammingGuide_3.1.pdf.
    [60] CUDA Toolkit 3.2 (January 2011) [EB/OL].http://developer.nvidia.com /object /cuda_3_2_downloads.html.
    [61] GeForce(?)(精视(?))GT 335M GPU [EB/OL].http://www.nvidia.cn/object/product_geforce_gt_335m_cn.html.
    [62] NVIDIA CUDA计算统一设备架构编程指南[EB/OL]. http://g.csdn.net/5089972

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700