Java虚拟机的自适应动态优化

英文题名：The Adaptive Optimizations in Java Virtual Machine
作者：邹琼
论文级别：博士
学科专业名称：计算机系统结构
中文关键词：Java虚拟机 ; 自适应动态优化框架 ; 垃圾收集 ; 对象局部性 ; 预取 ; 亲缘关系
英文关键词：JVM ; framework of adaptive optimization ; garbage collection ; object locality ; prefetch ; affinity
学位年度：2008
导师：胡伟武
学科代码：081201
学位授予单位：中国科学技术大学
论文提交日期：2008-04-01

摘要

Java语言以其在软件工程上的优势而被广泛地应用在各个领域的软件开发中。Java程序运行在Java虚拟机这一动态环境下,和传统的静态编译的二进制代码相比,它存在很多优势:代码的可移植性、安全性、自动化的内存管理和线程管理技术、动态类加载等等。这些方便而又强大的功能大大提高程序员的工作效率,因此被广泛使用。但是,这些动态的特性使得一些传统的静态编译技术不再适用,因此科学家们一直在探索新的编译技术,使得在虚拟机上能够获得更好的性能。
     由于缺乏运行时信息,静态编译采用较为复杂的全局分析而并不能得到理想的结果。Java虚拟机的介入使得编译及优化发生在程序运行时,因此工业界一直致力于发展自适应优化技术,希望能够利用程序运行时的动态信息来指导对程序进行何种优化。
     围绕Java程序中现有局部性的问题及其对应用程序性能的影响,本文系统深入地研究了Java虚拟机中的自适应优化技术,其主要的创新点及贡献如下:
     第一,设计并实现了一种低开销的自适应动态优化框架。该框架通过插桩来收集细粒度的信息,在程序运行的过程中,我们会根据反馈的信息自适应地调整插桩以降低开销,同时为了进一步减少插桩带来的影响,我们从Java程序的特性出发,尽量减少插桩的数目。
     和以前的静态分析工作相比,我们的工作是在运行时进行的,摆脱了因数据集变化而带来的不灵活性,和已有的动态分析工作相比,我们首次在Java虚拟机中实现自适应动态优化框架,弥补了Java虚拟机中现有的动态编译技术的不足,同时为了降低框架的运行时开销,我们针对Java语言的特性对框架进行一系列优化,包括框架设计、访问对象的插桩设计等,这些技术有效地降低了开销,进一步提高了Java程序的性能。最终的实验结果表明自适应优化框架的开销最多为2.5%,平均为1.7%。该框架为后面提出的局部性优化创造了良好的条件。
     第二,提出一种快速的滑动标记缩并算法。它在标记阶段记录位图和存活块池,在缩并阶段计算块内偏移表,将对堆的遍历转化为对块内偏移表的访问,大大地降低遍历堆所带来的开销;同时活块池的引入使得该算法很容易被应用在并行垃圾收集算法中。实验证明该算法使得标准工业测试程序SpecJBB2005、SpecJVM98和Dacapo的性能有不同程度的提高,最高达到8.9%;同时程序的局部性也优于线性标记缩并算法,与深度遍历序相比,DTLB(Data Translation Lookaside Buffer)失效率改善最多为11%,2级Cache失效率改善最多为13.6%。
     第三,基于自适应动态优化框架提出预取优化算法来改善程序的局部性。该算法基于自适应动态优化框架,它在即时编译器对程序编译的同时完成插桩的工作,插桩用来收集访存对象的信息。如果检测到当前运行过程中存在相关对象的访问,预取控制器将会插入相应的预取指令。自适应预取优化算法的关键在于预取准确性和运行时开销之间的权衡。为了保证预取的准确性,我们对程序进行插桩;为了降低运行时的开销,我们控制预取指令的插入并且实现无效的插桩删除优化。实验结果表明该算法使得标准工业测试程序SpecJBB2005、SpecJVM98和Dacapo的性能有不同程度的提高,最高达到18.1%,平均为7.15%。同时,运行时开销低于4%,内存开销可以忽略不计。
     第四,描述了一种基于对象亲缘关系的垃圾收集算法。该算法通过硬件性能分析器来定位频繁引起Cache失效的对象,根据对象之间的亲缘关系,建立对象亲缘图,并与垃圾收集算法相结合,将亲缘度高的对象们排列在堆中相邻的位置,这意味着访问完其中一个对象,接下来访问另外一个对象的概率很高,将它们放在一起可以改善对象之间的局部性,实验结果表明基于对象亲缘关系的垃圾收集算法对SpecJBB2005、SpecJVM98和Dacapo的性能有明显的提高,最多为4.9%,平均为3.4%,同时采用硬件性能分析器收集信息使得profiling的开销很低,平均为0.47%,最后我们将该算法和自适应预取优化相结合,结果表明大部分程序的性能不会降低,对于个别程序,甚至有所提高。
Java language is widely used in software design for its merits in software engineer.Java applications run on the Java virtual machine.Compared with binary code generated by traditional compilation,it has features of better modularity, platform independence,type safety and so on.These features make Java language more suitable for fast and safe development of many large scale softwares. However,those characters cause traditional compilation unable to work. Researchers keep exploring new compilation techniques to get better performance on Java virtual machine.
     For the short of runtime information,static compilation adapts complex global analysis,which can't satisfy our requirements.The popularity of Java virtual machine involves compilation and optimization at runtime,industries focuses on adaptive optimizations,and they want to optimize the applications according to runtime feedback.
     This dissertation systematically and deeply investigates the adaptive optimization and locality problem in Java virtual machine.The contributions of this work are as follows:
     Firstly,we design and implement an efficient adaptive optimization framework. The framework collects fine grained information by instrumentation,which also will be adjusted according to the runtime feedback.We also utilize the characters of Java applications to reduce the side-effect of instrumentation.
     Compared with static analysis,our work is implemented at runtime and is independent of variable data set.Compared with dynamic analysis,we designed an adaptive optimization framework in Java virtual machine,and supplied a gap in dynamic compilation techniques.We try to make the overhead lowest throughout the framework:its design,instrumentation,and so on.The results show that the overhead of the framework is 1.7%on average,with highest of 2.5%.The framework provides a platform for locality optimization in subsequent chapters.
     Secondly,we suggested a fast slide mark compact algorithm.Allocation order is the best for locality,which slide mark compact algorithm is based on.But traditional design made the algorithm's overhead too large.In this dissertation, we proposed a fast slide mark compact algorithm,which reduces the overhead by mark bit table,live block pool and offset table.The results show that it achieves up to 8.9%speedup in industry-standard benchmark SpecJBB2005,SpecJVM98 and Dacapo on the Pentium 4,11%improvement in dtlb miss numbers and 13.6% reduce with L2 cache miss numbers.
     Thirdly,a dynamic prefetch optimization is adopted based on the adaptive framework.We instruments the program in JIT compiler for load address profiling, detects the stride patterns periodically at runtime.When a stride pattern is discovered,we injects prefetch instruction and removes the instrumentation effect. The key points in the design are the tradeoffs between prefetching accuracy and runtime overhead.In order to reduce the runtime overhead,we developed techniques to remove the redundant instrumentations,control the prefetch instruction injections,and disable the useless instrumentation.Our pattern detection is light-weighted that we use a sliding window to filter the trace information for runtime analysis,and we use a stride frequency array that covers stride range between -64 and 64.Finally,the experimental evaluations show that the prefetch optimization can speedup SpecJBB2005.SpecJVM98 and DaCapo benchmarks up to 18.1%,with an average of 7.15%.At the same time,the maximal runtime overhead is less than 4%,and the memory overhead is negligible.
     Finally,we combined object affinity with garbage collection.Firstly,we use hardware performance analyzer to locate delequent objects,find out the affinity between them,and build an affinity graph.Garbage collection refers affinity graph,and the related objects would be colocated,such design can improve the location between delequent objects,because related objects are always accessed sequently.The experimental evaluations show the garbage collection based on object affinity can speedup SpecJBB2005.SpecJVM98 and Dacapo benchmarks up to 4.9%,with an average of 3.4%.At the same time,the overhead of hardware method is 0.47%.Finally,we implemented prefetch optimization on it,the result shows that the performance won't be reduced,for some applications,the performance are better.

引文

[Adl-Tabatabai03] A.-R. Adl-Tabatabai, J. Bharadwaj, D.-Y. Chen, A. Ghu-loum, V. S. Menon, B. R. Murphy, M. Serrano, and T. Shpeisman. The StarJIT compiler: a dynamic compiler for managed runtime environments. Intel Technology Journal, 7(1), February 2003.
    [Adl-Tabatabai04] A.-R. Adl-Tabatabai, R. L. Hudson, M. J. Serrano, and S. Subramoney. Prefetch injection based on hardware monitoring and object metadata. SIGPLAN Not, 39(6):267 - 276, 2004.
    [Alonso90] Raphael Alonso and Andrew W. Appel. An advisor for flexible working sets. SIGMETRICS Perform. Eval. Rev, 18(1):153-162, 1990.
    [Andre86] D. L. Andre. Paging in Lisp programs. Master's thesis, University of Maryland, College Park, Maryland. 1986.
    [Arnold00] M. Arnold, S.J. Fink, D. Grove, M. Hind, and P.F. Sweeney. A survey of adaptive optimization in virtual machines. Proceedings of the IEEE, 93(2):449-466, Feb. 2005.
    [Arnold01] Matthew Arnold, Stephen Fink, David Grove, Michael Hind, and Peter F. Sweeney. Adaptive optimization in the jalapeno jvm. SIGPLAN Not, 35(10):47-65, 2000.
    [Arnold02] Matthew Arnold, Michael Hind, and Barbara G. Ryder. Online feedback- directed optimization of java. SIGPLAN Not, 37(11):111-129, 2002.
    [Arnold05] Matthew Arnold and Barbara G. Ryder. A framework for reducing the cost of instrumented code. In PLDI '01: Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation, pages 168-179, New York, NY, USA, 2001. ACM.
    [Artigas00] Pedro V. Artigas, Manish Gupta, Samuel P. Midkiff and Jose E. Moreira. Automatic loop transformations and parallelization for java. In ICS '00: Proceedings of the 14th international conference on Supercomputing, pages 1-10, New York, NY, USA, 2000. ACM.
    [Badawy01] Abdel-Hameed A. Badawy, Aneesh Aggarwal, Donald Yeung, and Chau-Wen Tseng. Evaluating the impact of memory system performance on software prefetching and locality optimizations. In ICS '01: Proceedings of the 15th international conference on Supercomputing, pages 486-500, New York, NY, USA, 2001. ACM.
    [Baker91] G. Baker, Henry. Cache-conscious copying collectors. In Submission for GC'91 Workshop on Garbage Collection in Object-Oriented Systems, pages 122-130.
    [Bala00] Vasanth Bala, Evelyn Duesterwald, and Sanjeev Banerjia. Dynamo: a trans- parent dynamic optimization system. SIGPLAN Not., 35(5):1-12, 2000.
    [Bernstein95] David Bernstein, Doron Cohen, and Ari Freund. Compiler techniques for data prefetching on the powerpc. In PACT '95: Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation tech- niques, pages 19-26, Manchester, UK, UK, 1995. IFIP Working Group on Algol.
    [Blackburn04a] Stephen M. Blackburn, Perry Cheng, and Kathryn S. McKin-ley. Myths and realities: the performance impact of garbage collection. SIG-METRICS Perform. Eval. Rev., 32(1):25-36, 2004.
    [Blackburn04b] Stephen M. Blackburn, Perry Cheng, and Kathryn S. McKin-ley. Oil and water? high performance garbage collection in java with mmtk. In ICSE '04: Proceedings of the 26th International Conference on Software Engineering, pages 137-146, Washington, DC, USA, 2004. IEEE Computer Society.
    [Blackburn06] Stephen M. Blackburn, Robin Garner, Chris Ho(?)ann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, B. Moss, Aashish Phansalkar, Darko Stefanovic, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. The dacapo benchmarks: java benchmarking development and analysis. In OOPSLA '06: Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications, pages 169-190, New York, NY, USA, 2006. ACM.
    [Blau83] Ricki Blau. Paging on an object-oriented personal computer for Smalltalk. Technical report, Berkeley, CA, USA, 1983.
    [Brecht06] Tim Brecht, Eshrat Arjomandi, Chang Li, and Hang Pham. Controlling garbage collection and heap growth to reduce the execution time of java applications. ACM Trans. Program. Lang. Syst., 28(5):908-941, 2006.
    [Buytaert05] D. Buytaert, K. Venstermans, L. Eeckhout et. al. Garbage collection hints. In HIPEAC05: Lecture Notes in Computer Science, 3793, Springer-Verlag, November 2005.
    [Byler87] M. Byler, J. R. B. Davies, C. Huson, B. Leasure, and M. Wolfe. Multiple version loops. In PP '87: Proceedings of International Conference on Parallel Processing, pages 312-318, 1987.
    [Cahoon01] Brendon Cahoon and Kathryn S. McKinley. Data ° ow analysis for software prefetching linked data structures in java. In PACT '01: Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques, pages 280-291, Washington, DC, USA, 2001. IEEE Computer Society.
    [Calder98] Brad Calder, Chandra Krintz, Simmi John, and Todd Austin. Cache- conscious data placement. SIGPLAN Not., 33(11):139-149, 1998.
    [Callahan91] David Callahan, Ken Kennedy, and Allan Porterfield. Software prefetching. SIGPLAN Not., 26(4):40-52, 1991.
    [Chambers90] C. Chambers. The cecil language: Specification and rationale. Technical Report TR-93-03-05, 1993.
    [Chambers91] Craig Chambers and David Ungar. Making pure object-oriented languages practical. In OOPSLA '91: Conference proceedings on Object-oriented programming systems, languages, and applications, pages 1-15, New York, NY, USA, 1991. ACM.
    [Chambers93] Craig Chambers. The Cecil language: Specification and rationale. Technical Report TR-93-03-05, University of Washington Seattle, March 1993.
    [Chang91a] Pohua P. Chang, Scott A. Mahlke, William Y. Chen, Nancy J. Warter, and Wen mei W. Hwu. Impact: an architectural framework for multiple- instruction-issue processors. SIGARCH Comput. Archit. News, 19(3):266-275, 1991.
    [Chang91b] Pohua P. Chang, Scott A. Mahlke, and Wen mei W. Hwu. Using profile information to assist classic code optimizations. Softw. Pract. Exper., 21(12):1301-1321, 1991.
    [Chen94] William Y. Chen, Scott A. Mahlke, Nancy J. Warter, Sadun Anik, and Wen mei W. Hwu. Profile-assisted instruction scheduling. Int. J. Parallel Pro-gram., 22(2):151-181, 1994.
    [Chen06] Wen ke Chen, Sanjay Bhansali, Trishul Chilimbi, Xiaofeng Gao, and Wei-haw Chuang. Profile-guided proactive garbage collection for locality optimization. SIGPLAN Not., 41(6):332-340, 2006.
    [Cheney70] C. J. Cheney. A nonrecursive list compacting algorithm. Commun. ACM, 13(11):677-678, 1970.
    [Chilimbi98] Trishul M. Chilimbi and James R. Larus. Using generational garbage collection to implement cache-conscious data placement. SIGPLAN Not, 34(3):37-48, 1998.
    [Chilimbi99] Trishul M. Chilimbi, Bob Davidson, and James R. Larus. Cache-conscious structure definition. In PLDI '99: Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation, pages 13-24, New York, NY, USA, 1999. ACM.
    [Chilimbi02] Trishul M. Chilimbi and Martin Hirzel. Dynamic hot data stream prefetching for general-purpose programs. In PLDI '02: Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation, pages 199-209, New York, NY, USA, 2002. ACM.
    [Cierniak00] Michal Cierniak, Guei-Yuan Lueh, and James M. Stichnoth. Practicing judo: Java under dynamic optimizations. SIGPLAN Not, 35(5):13-26, 2000.
    [Clark77] Douglas W. Clark and C. Cordell Green. An empirical study of list structure in lisp. Commun. ACM, 20(2):78-87, 1977.
    [Clark79] D. W. Clark. Measurements of dynamic list structure use in lisp. IEEE Trans. Softw. Eng, 5(1):51-59, 1979.
    [Cohen83] Jacques Cohen and Alexandra Nicolau. Comparison of compacting algorithms for garbage collection. ACM Trans. Program. Lang. Syst., 5(4):532-553, 1983.
    [Cohn00] R. Cohn and P. Lowney. Design and analysis of profile-based optimization in Compaq's compilation tools for alpha, 2000.
    [Collins60] George E. Collins. A method for overlapping and erasure of lists. Commun. ACM, 3(12):655-657, 1960.
    [Cooper92] Eric Cooper, Scott Nettles, and Indira Subramanian. Improving the performance of sml garbage collection using application-specific virtual memory management. SIGPLAN Lisp Pointers, V(1):43-52, 1992.
    [Courts88] Robert Courts. Improving locality of reference in a garbage-collecting memory management system. Commun. ACM, 31(9):1128-1138, 1988.
    [Deaver99] Dean Deaver, Rick Gorton, and Norm Rubin. Wiggins/Redstone: An on-line program specializer. In Hot Chips, 1999.
    [Detreville90] John DeTreville. Experience with garbage collection for modula-2+ in the topaz environment. OOPSLA/ECOOP '90Workshop on Garbage Collection in Object-Oriented Systems. Summary appears in GCOOS, 1990.
    [Deutsch76] L. Peter Deutsch and Daniel G. Bobrow. An e±cient, incremental, automatic garbage collector. Commun. ACM, 19(9):522-526, 1976.
    [Deutsch84] L. Peter Deutsch and Allan M. Schi(?)an. E(?)ient implementation of the smalltalk-80 system. In POPL '84: Proceedings of the 11th ACM SIGACT- SIGPLAN symposium on Principles of programming languages, pages 297-302, New York, NY, USA, 1984. ACM.
    [Dulong98] Carole Dulong. The ia-64 architecture at work. Computer, 31(7):24-32, 1998.
    [Fenichel69] Robert R. Fenichel and Jerome C. Yochelson. A lisp garbage-collector for virtual-memory computer systems. Commun. ACM, 12(11):611-612, 1969.
    [Foderaro81] John K. Foderaro and Richard J. Fateman. Characterization of vax macsyma. In SYMSAC '81: Proceedings of the fourth ACM symposium on Symbolic and algebraic computation, pages 14-19, New York, NY, USA, 1981. ACM.
    [Gomish99] Edward H. Gornish and Alexander Veidenbaum. An integrated hard- ware/software data prefetching scheme for shared-memory multiprocessors. Int. J. Parallel Program., 27(1):35-70, 1999.
    [Griswold78] Ralph E. Griswold. A history of the snobol programming languages. SIGPLAN Not, 13(8):275-308, 1978.
    [Haddon67] Haddon, B. K, Waite, W. M. A Compaction Procedure for Variable Length Storage Elements. In The Computer Journal, vol. 10, pages 162-165. 1967.
    [Hansen74] Gilbert Joseph Hansen. Adaptive systems for the dynamic runtime optimization of programs. PhD thesis, Pittsburgh, PA, USA, 1974.
    [Hayes91] Barry Hayes. Using key object opportunism to collect old objects. SIGPLAN Not, 26(11):33-46, 1991.
    [Hazelwood03] Kim Hazelwood and David Grove. Adaptive online context-sensitive in- lining. In CGO '03: Proceedings of the international symposium on Code generation and optimization, pages 253-264, Washington, DC, USA, 2003. IEEE Computer Society.
    [Hennessy02] John L. Hennessy, David A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, San Mateo, CA. 3rd edition, 2002.
    [Hirzel01] M. Hirzel and T. Chilimbi. Bursty tracing: A framework for low-overhead temporal profiling. In FDDO '04: Proceddings of 4th ACM Workshop on Feedback-Directed and Dynamic Optimization, pages 117-126. 2001. ACM.
    [Holzle94] Urs Holzle and David Ungar. A third-generation self implementation: reconciling responsiveness with performance. SIGPLAN Not, 29(10):229-243, 1994.
    [Holzle96] Urs Holzle and David Ungar. Reconciling responsiveness with performance in pure object-oriented languages. ACM Trans. Program. Lang. Syst, 18(4):355-400, 1996.
    [Huang04] Xianglong Huang, Stephen M. Blackburn, Kathryn S. McKin-ley, J Eliot B. Moss, Zhenlin Wang, and Perry Cheng. The garbage collection advantage: improving program locality. SIGPLAN Not., 39(10):69-80, 2004.
    [Hwu93] Wen-Mei W. Hwu, Scott A. Mahlke, William Y. Chen, Pohua P. Chang, Nancy J. Warter, Roger A. Bringmann, Roland G. Ouellette, Richard E. Hank, Tokuzo Kiyohara, Grant E. Haab, John G. Holm, and Daniel M. Lavery. The superblock: an effective technique for vliw and superscalar compilation. J. Supercomput., 7(1-2):229-248, 1993.
    [IA64] IA-64 application developer's architecture guide. Intel Corp. Available: http://www.csee.umbc.edu/help/architecture/adag.pdf
    [Inagaki03] Tatsushi Inagaki, Tamiya Onodera, Hideaki Komatsu, and Toshio Nakatani. Stride prefetching by dynamically inspecting objects. SIGPLAN Not., 38(5):269-277, 2003.
    [IBMa] IBM Corporation. Persistent Reusable JVM. Project home page. http://www.haifa.il.ibm.com/projects/systems/rs/persistent.html.
    [Intela] Intel Corporation. Intel(R) Architecture Software Developer's Manual, Volume 2: Instruction Set Reference Manual, htt p://download.intel.com/des ign/intarch/manuals.
    [Intelb] Intel Corporation. Intel Pentium 4 Processor Optimization Reference Manual, http://www.intel.com/design/pentium/manuals/ 248966.pdf.
    [Jones96] R Jones and R Lins. Garbage Collection: Algorithms for Automatic Dynamic Memory Management. American: John Wiley Sons, 2nd edition, 1996.
    [Jonkers79] H. B. M. Jonkers. A fast garbage compaction algorithm. Information Processing Letters, 9(1),pages 25-30. 1979.
    [Kandiraju02] Gokul B. Kandiraju and Anand Sivasubramaniam. Characterizing the dtlb behavior of spec cpu2000 benchmarks. In SIGMETRICS '02: Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pages 129-139, New York, NY, USA, 2002. ACM.
    [Kessler99] R. E. Kessler. The alpha 21264 microprocessor. IEEE Micro, 19(2):24-36, 1999.
    [Knuth71] D. E. Knuth. An empirical study of FORTRAN programs, in Softw. Pract. Exper., vol. 1, pages 105-133. 1971.
    [Kumar04] R. Kumar, R. Jaiswal and S. K. Aggarwal. An Inlining Te-chinique in Jikes RVM to Improve Performance. In Advances in Computer Science and Technology, November 2004.
    [Lam92] Michael S. Lam, Paul R. Wilson, and Thomas G. Moher. Object type di- rected garbage collection to improve locality. In IWMM '92: Proceedings of the International Workshop on Memory Management, pages 404-425, London, UK, 1992. Springer-Verlag.
    [Lang87] B. Lang and F. Dupont. Incremental incrementally compacting garbage collection. SIGPLAN Not, 22(7):253-263, 1987.
    [Lieberman83] Henry Lieberman and Carl Hewitt. A real-time garbage collector based on the lifetimes of objects. Commun. ACM, 26(6):419-429, 1983.
    [Lipasti95] Mikko H. Lipasti, William J. Schmidt, Steven R. Kunkel, and Robert R. Roediger. Spaid: software prefetching in pointer- and call-intensive environments. In MICRO 28: Proceedings of the 28th annual international symposium on Microarchitecture, pages 231-236, Los Alamitos, CA, USA, 1995. IEEE Computer Society Press.
    [Luk96] Chi-Keung Luk and Todd C. Mowry. Compiler-based prefetching for recur- sive data structures. SIGOPS Oper. Syst. Rev., 30(5):222-233, 1996.
    [Luk99] Chi-Keung Luk and Todd C. Mowry. Automatic compiler-inserted prefetching for pointer-based applications. IEEE Trans. Comput, 48(2):134-141, 1999.
    [Matlab] MathWorks: Main product page: http://www.mathworks.com/pro ducts
    [McCarthy60] John McCarthy. Recursive functions of symbolic expressions and their com- putation by machine. Technical report, Cambridge, MA, USA, 1959.
    [McCarthy78] John McCarthy. History of lisp. SIGPLAN Not, 13(8):217-223, 1978.
    [Mock00] M. Mock, C. Chambers, and S.J. Eggers. Calpa: a tool for automating selective dynamic compilation. Microarchitecture, 2000. MICRO-33. Proceedings. 33rd Annual IEEE/ACM International Symposium on, pages 291-302, 2000.
    [Moon84] David A. Moon. Garbage collection in a large lisp system. In LFP '84: Proceedings of the 1984 ACM Symposium on LISP and functional programming, pages 235-246, New York, NY, USA, 1984. ACM.
    [Mowry92] Todd C. Mowry, Monica S. Lam, and Anoop Gupta. Design and evaluation of a compiler algorithm for prefetching. In ASPLOS-V: Proceedings of the - fth international conference on Architectural support for programming languages and operating systems, pages 62-73, New York, NY, USA, 1992. ACM.
    [Nori75] Jr. Paul Pen - eld. An apl interpreter written in apl. In APL '75: Proceedings of seventh international conference on APL, pages 265-269, New York, NY, USA, 1975. ACM.
    [Paleczny01] M. Paleczny, C. Vick, and C. Click, "The Java hotspot server compiler," In JVM' 01: Proceedings of Usenix Java Virtual Machine Research and Technology, pages 1-12. 2001.
    [Penfield75] Jr. Paul Penfield. An apl interpreter written in apl. In APL '75: Proceedings of seventh international conference on APL, pages 265-269, New York, NY, USA, 1975. ACM.
    [Perl] Perl directory: http://www.perl.org
    [Pettis90] K. Pettis and R.C Hansen. Profile guided code positioning. In: Bernard N. Fischer, ed. Proc, of the ACM SIGPLAN 1990 Conf, on Programming Language Design and Implementation. New York: ACM Press, 1990. 16-27.
    [Python] Python programming language: http://www.python.org
    [Richards77] M. Richards. Guarded and unguarded coroutines: an implementation in bcpl. Softw. Pract. Exper., 14(4):369-376, 1984.
    [Roth99] Amir Roth and Gurindar S. Sohi. Effective jump-pointer prefetching for linked data structures. SIGARCH Comput. Archit. News, 27(2):111-121, 1999.
    [Santhanam97] Vatsa Santhanam, Edward H. Gornish, and Wei-Chung Hsu. Data prefetching on the hppa-8000. SIG ARCH Comput. Archit. News, 25(2):264-273, 1997.
    [Saunders74] Robert A. Saunders. The LISP system for the Q-32 computer. In Berkeley and Bobrow. pages 220-231.
    [Shaw88] Robert Allen Shaw. Empirical analysis of a LISP system. PhD thesis, Stanford, CA, USA, 1988.
    [Shuf02] Yefim Shuf, Manish Gupta, Hubertus Franke, Andrew Appel, and Jaswinder Pal Singh. Creating and preserving locality of java applications at allocation and garbage collection times. In OOPSLA '02: Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, pages 13-25, New York, NY, USA, 2002. ACM.
    [Siegwart06] David Siegwart and Martin Hirzel. Improving locality with parallel hierarchical copying gc. In ISMM '06: Proceedings of the 5th international symposium on Memory management, pages 52-63, New York, NY, USA, 2006. ACM.
    [SmithOO] Michael D. Smith. Overcoming the challenges to feedback-directed optimization (keynote talk). In DYNAMO '00: Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization, pages 1-11, New York, NY, USA, 2000. ACM.
    [Stamos82] Stamos, James W., "A Large Object-Oriented Virtual Memory: Grouping Strategies, Measurements, and Perfommnce", Xerox PARC report SCG-82-2, May 1982. this undergraduate thesis at MlT.
    [Stamos84] James W. Stamos. Static grouping of small objects to enhance performance of a paged virtual memory. ACM Trans. Comput. Syst., 2(2): 155-180, 1984.
    [Suganuma01] Toshio Suganuma, Toshiaki Yasue, Motohiro Kawahito, Hideaki Komatsu, and Toshio Nakatani. A dynamic optimization framework for a java just-in- time compiler. SIGPLAN Not., 36(11):180-195, 2001.
    [Suganuma02] Toshio Suganuma, Toshiaki Yasue, and Toshio Nakatani. An empirical study of method in-lining for a java just-in-time compiler. In Pro- ceedings of the 2nd Java Virtual Machine Research and Technology Symposium, pages 91-104, Berkeley, CA, USA, 2002. USENIX Association.
    [Ungar84] David Ungar. Generation scavenging: A non-disruptive high performance storage reclamation algorithm. SIGPLAN Not., 19(5):157-167, 1984.
    [VTune] Intel Corporation. VTune Performance Analyzer.http://www.intel. com/cd/software/products/apac/zho/vtune/275878.htm
    [White80] Jon L. White. Address/memory management for a gigantic lisp environment or, gc considered harmful. In LFP '80: Proceedings of the 1980 ACM conference on LISP and functional programming, pages 119-127, New York, NY, USA, 1980. ACM.
    [Wilson91] Paul R. Wilson, Michael S. Lam, and Thomas G. Moher. Effective static- graph reorganization to improve locality in garbage-collected systems. SIGPLAN Not, 26(6):177-191, 1991.
    [Wu01] Yefim Shuf, Mauricio J. Serrano, Manish Gupta, and Jaswinder Pal Singh. Characterizing the memory behavior of java workloads: a structured view and opportunities for optimizations. SIGMETRICS Perform. Eval. Rev., 29(1):194-205, 2001.
    [Wu02] Youfeng Wu. E±cient discovery of regular stride patterns in irregular programs and its use in compiler prefetching. SIGPLAN Not., 37(5):210-221, 2002.
    [Yang04] Ting Yang, Matthew Hertz, Emery D. Berger, Scott F. Kaplan, and J. Eliot B. Moss. Automatic heap sizing: taking real memory into account. In ISMM '04: Proceedings of the 4th international symposium on Memory management, pages 61-72, New York, NY, USA, 2004. ACM.
    [Yeager96] Kenneth C. Yeager. The mips r 10000 superscalar microprocessor. IEEE Micro, 16(2):28-40, 1996.
    [Zhang06] Chengliang Zhang, Kirk Kelsey, Xipeng Shen, Chen Ding, Matthew Hertz, and Mitsunori Ogihara. Program-level adaptive memory management. In ISMM '06: Proceedings of the 5th international symposium on Memory management, pages 174-183, New York, NY, USA, 2006. ACM.
    [Zorn90] Benjamin Zorn. Comparing mark-and sweep and stop-and-copy garbage collection.In LFP '90:Proceedings of the 1990 ACM conference on LISP and functional programming,pages 87-98,New York,NY,USA,1990.ACM.
    [郇06]郇丹丹,高性能存储系统研究.中国科学院研究生院博士学位论文,2006.4.
    [伍07]伍鸣,Java虚拟机的内存管理及优化.中国科学院研究生院博士学位论文,2007.5.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700