高性能计算机的存储方法研究

作者：李恩有
论文级别：博士
学科专业名称：计算机体系结构
中文关键词：计算机体系结构 ; 并行存储系统 ; 层次存储系统 ; 高速缓存系统 ; XOR斜排存储方法 ; 程序算法 ; Pentium计算机系统
英文关键词：Computer Architecture ; Parallel Memory System ; Hierachical Memory system ; Cache ; XOR Skewed Scheme ; Algorithm ; Pentium Computer System
学位年度：1997
导师：夏培肃 ; 张祥 ; 刘志勇
学科代码：081201
学位授予单位：中国科学院研究生院（计算技术研究所）
论文提交日期：1997-05-01

摘要

半导体技术的发展，使得主存储器件的存取速度不能满足处理器存取数据的要求，人们在各种计算机系统中广泛采用了并行存储系统和层次存储系统，以提高整个存储系统的平均存取速度。
     然而，在实际应用中却发现，传统的并行存储系统和层次存储系统并不是总能达到人们的预期目的。这是由于处理器的存取访问在并行存储系统中和高速缓存系统中存在存储体冲突和高速缓存行冲突。进一步的研究发现，并行存储系统和层次存储系统中的存储映射方法对它们的存储性能有很大的影响。
     XOR斜排存储方法是一类非常有效的非线性斜排存储方法，作者在研究了许多具有实际使用价值的XOR存储方法的基础上，提出了LR-XOR斜排存储方法．在采用LR-XOR斜排存储方法的并行存储系统中，不仅可以并行存取在传统的交叉并行存储系统中可以并行存取的连续存储数据存取模式，而且可以并行存取N×N矩阵的矩阵行、矩阵列、矩阵主P×Q块、矩阵散列P×Q块以及间隔为2~i的等间隔主向量、间隔为2~i的移位等间隔主向量等许多在科学和工程应用程序中常用的数据存取模式，可以大幅度地提高并行存储系统的平均存取速度。
     本文在对高速缓存系统结构进行深入分析的基础上，把XOR斜排存储映射方法应用于数据高速缓存的存储映射中。理论分析表明，在高速缓存映射系统中采用EE-XOR和LR-XOR存储映射方法，可以使科学和工程应用程序中大量常用存取模式的所有数据元素同时驻留在高速缓存系统中，把应用程序中的数据复用率更多地转化为高速缓存系统中暂存数据的复用率，从而大幅度地提高层次存储系统的平均存取速度，充分发挥处理器的运算能力。
     作者创造性地在高速缓存系统的映射机构中实现了EE-XOR斜排存储方法，以使高速缓存系统可以充分地利用程序执行过程中存储访问的局部性。在作者设计的Pentium和平实验系统中，其二级高速缓存映射中使用
With the rapid development in semiconductor technology, the disparity of data access cycle-time between the fast microprocessors and the relatively slow main memory systems become more and more serious. The computer designers use the parallel memory system and hierarchical memory systems in their computers in order to reduce the average access time of the memory system.
    However, practical experiences have shown that the traditional interleaved parallel memory system architecture and hierarchical memory systems can not satisfy well the most frequently used data accesses in a wide variety of application algorithms. This is because the most frequently used data patterns can produce many memory bank conflicts in the traditional parallel memory system or cache memory line conflicts in the cache memory systems. Address mapping methods used in the parallel systems or the cache memory systems have important effects on the conflict rates.
    XOR schemes are a set of nonlinear skewed memory allocation schemes which can be used in the parallel memory systems. We present a XOR scheme, named LR-XOR scheme, after careful studying the former schemes. In a parallel memory system with N=2~i memory banks, the processing units can access most of the data patterns frequently used in scientific and engineering programs only in one memory access cycle. These include the row, column, main PxQ block, scattered PxQ block, main vector and shifted main vector with 2~i stride of NxN matrix. These parallel access properties can reduce the memory system's average data access time.
    Comparing the behavior of parallel memory system and the behavior of the cache memory system, we found that the memory schemes used in parallel memory system should have good properties if they are used as cache memory mapping schemes. Our theoretical analysis stated that if we use the EE-XOR or

引文

1．高勇，“采用斜移存储方案的向量CACHE”，中国科学院计算技术研究所硕士论文，1994．
    2．李惠安，“XOR映象CACHE的分析”，中国科学院计算技术研究所硕士论文，1995．
    3．郭晓涛，“QR分解的CACHE利用率研究”，中国科学院计算技术研究所硕士论文，1996．
    4．刘志勇，李惠安，“并行和层次存储系统”，第八届全国信息存储学术会议论文集，第1-10页，西安，1994年10月．
    5．熊劲，“单处理机系统的多级高速缓存”，中国科学院计算技术研究所硕士论文，1993．
    6．胡伟武，“共享存储系统中的访存事件次序”，中国科学院计算技术研究所博士论文，1996．
    7．乔香珍，“CACHE性能与程序优化”，计算机学报，第19卷，第11期，1996年11月．
    8．陈国良、陈峻，“VLSI计算理论与并行算法”，中国科学技术出版社，1991年．
    9．周祖成，“电子设计自动化(EDA)”，清华大学信息科学学院．1995年1月．
    10．“超大规模集成电路设计的CAD”，清华大学电子工程系，1989年3月．
    11．齐秋群、刚寒冰，“AMD PAL／PALCE／MACH系列可编程逻辑器件设计、应用与数据手册”，电子工业出版社，1995年．
    12．王璞等译，WH普雷斯等著，“数值方法大全—科学计算的艺术”，兰州大学出版社，1991年5月．
    13. A. Agarwal and Steven D. Pudar. "Column-Associative Caches: A Thechnique for Reducing the Miss Rate of Dirrect-Mapped Caches", 1993.
    14. A. Agarwal, et al., "An Analytical Cache Model", ACM Trans. on Computer Systems, Vol. 7, No. 2, May 1989.
    15. A. H. Karp, "Programming for Parallelism", IEEE Computer, May 1987.
    16. A. J. Smith, "Bibliography and Readings on CPU Cache Memories and Related Topics", Computer Architecture News, 19(4), 1991.
    17. A. J. Smith, "Bibliography and Readings on CPU Cache Memories and Related Topics", Computer Architecture News, 14(1), 1986.
    18. A. J. Smith, "Cache Memory", Computer Surveys, Vol. 14, No. 3, Sep. 1982.
    19. A. K. Somani and Vinod K. Agarwal, "Distributed Diagnosis Algorithms for regular Interconnected Structures", IEEE Trans. on Computers, Vol. 41, No. 7, July 1985.
    20. A. Norton and E. Melton, "A Class of Boolean Linear Transformation for Conflict-free Power-of-two Stride Access", In Proc. of the 1987 International Conference on Parallel Processing, Aug. 1990, Vol. 1, pp. 76-83.
    20a. A. Mendlson, et al., "Compile Time Instruction Cache Optimizations", 1993.


    21. A. N. Choudhary, et al., "NETRA: A Hierarchical and Partitionable Architecture for Computer Vision Systems", IEEE Trans, on Parallel and Distributed Systems, Vol. 4. No. 10, Oct. 1993.

    22. A. R. Leleck and David A. Wood, "Cache Profiling and the SPEC Bechmarks: A Case Study", Computer Vol. 27, No. 10, Oct. 1994.

    23. A. Seznec, "A Case for Two-Way Skewed-Associative Cache", The 20th Annual, 1993.

    24. A. Seznec, "Decoupled Sectored Caches", IEEE Trans, on Compters, Vol.46. No.2., 1997.

    25. B. L. Jacob, etc., "An Analytical Model for Designing Memory Hierachies", IEEE Trans, on Compters, Vol.45. No. 10., 1996.

    26. B. Ramarkrishna Rau, Michael S. Schlansker and David W. L. Yen, "The CYDRA 5 Stride- Intensive Memory System", ICPP, 1989.

    27. Benny Chor, Charts E. Leiscrrson and Ronald L. Rivest, LCS of MIT, "An Application of Number Theory to the Orgnization of Raster-Graphics Memory", Proc, of the 23rd Annual IEEE Symposium on Foundations of Computer Science, Chicago, 1982.

    28. Bill Nitzberg, et al., "Distributed Shared Memory: A Survey of Issues and Algorithms", IEEE Computer, Vol. 24, No. 2, July 1991.

    29. Chuen-Liang Chen and Chung-kai Liao, "Analysis of Vector Access Performance on Skewed Interleaved Memory", ISCA, 1989.

    30. C. Eric Wu , Yarsun Hsu and Yew-Huey Liu, "A Quantitation of Cache Types for High- Performance Computer Systems", IEEE Trans, on Computers, Vol. 42, No. 10, Oct. 1993.

    31. C. Flicker, et al., "Accurate Evaluation of Blocked Algorithms' Cache Performance", IEEE Computer Society Technical Committee on Computer Architecture Newsletter, Fall 1993.

    32. C.S. Raghavendra and Rajendra Boppana, "On Methods for Fast and Efficient Parallel Memory Access", ICPP, 1990.

    33. Carl D. Howe and Bruce Moxon, "How to Program Parallel Processors", IEEE Spectrum, Sep., 1987.

    34. Charles J. Colbourn, "Conflict-free Access to Parallel Memories", JPDC(Journal of Parallel and Distributed Computing), Vol. 14, No. 2, Feb. 1992.

    35. D. A. Koufaty, etc., "Data Forwarding in Scalable Shared-Memory Multiprocessors", IEEE Trans, on Parallel and Distributed Systems, Vol.7. No. 12., 1996.
    36. D. Durand, T. Montaut, etc., "Impact of Memory Contention on Dynamic Scheduling on NUMA Multiprocessors", IEEE Trans. on Parallel and Distributed Systems, Vol. 7. No. 11., 1996.
    37. D. T. Harper III and D. A. Linebarger, "A Dynamic Storage Scheme for Conflict-Free Vector Access", ISCA, 1989.
    38. D. T. Harper III and Darel A. Linebarger, "Conflict-Free Vector Access Using a Dynamic Storage Scheme", IEEE Trans. on Computers, Vol. 40, No. 3, March 1991.
    39. D. T. Harper III and J. Robert Jump, "Vector Access Performance in Parallel Memories Using a Skewed Storage Scheme", IEEE Trans. on Computers, Vol. c-36, No. 12, Dec. 1985.
    40. D. T. Harper III, "address translation to increase memory performance", In Proc. of the 1989 International Conference on Parallel Processing, Aug. 1989, pp. 237-241.
    41. D. T. Harper III, et al., "Block, Muitistride Vector, and FFT Accesses in Parallel Memory Systems", IEEE Trans. on Parallel and Distributed Systems, Vol. 2, No. 1, Jan. 1991.
    42. D. T. Harper III and D. A. Linebarger, "Storage Schemes for Efficient Computation of a Radix 2 FFT in a Machine with Parallel Memories", ICPP, 1988.
    43. Daniel Lenoski, et al., "The Stanford Dash Multiprocessor", IEEE 1992.
    44. David C. Van Voorhis and Thomas H. Morrin, "Memory System for Image Processing", IEEE Trans. on Computers, Vol. c-27, No. 12, Feb. 1978.
    45. David CaUahan, et al., "Constructing the Procedure Call Multigraph", IEEE Trans. on Software Engineering, Vol. 16, No. 4, April, 1990.
    46. David Callahan, et al., "Interprocedural Constant Propagation", Computer Construction, 1986.
    47. David J. Lilja, "Cache Coherence in Large-Scale Shared Memory Multiprocessors: lssuse and Comparisons", ACM Computing Survey, Sep. 1993.
    48. De-Lei Lee and Morkhtar Aboelaze, "An Efficient Method for Distributed Data in Hypercube Computers", the 5th Distr. Mere. Computing Conf., April 8-12, 1990.
    49. Delei Lee, "A Multiple- Processor Architecture for Image Processing", Ph. D thesis, Dept. of Computing Science, University of Alberta, Edmonton, Alberta, Canada, 1987.
    50. de-lei Lee, "On Access and Alignment of Data in a Parallel Processor", Information Processing Letter, 23(1), 11-14(Oct., 1989).
    51. De-lei Lee, "Architecture of an Array Processor Using a Nonlinear Skewing Sckeme", IEEE Trans. on Computers, Vol. 41, No. 6, June 1985.
    52. Duncan H. Lawrie and Chandra R. Vora, "The Prime Memory System for Array Access", IEEE Trans. on Computers, Vol. c-31, No. 5, May. 1982.
    53. Duncan H. Lawrie, "Access and Alignment of Data in an Array Processor", IEEE Trans. on Computers, Vol. c-24, No. 12, Dec. 1975.
    54. Eric M. Dowling, et al., "HARP: An Open Architecture for Parallel Matrix and Signal Processing", IEEE Trans. on Parallel and Distributed Systems, Vol. 4. No. 10, Oct. 1993.
    55. Eugene D. Brooks III, "The Indirect k-ary n-cube for Vector Processin Environment", Parallel Computing 6(1988).
    56. F. Peng and M. Dubois ,"A New Approach for the Verification of Cache Conherence Protocals", IEEE Trans. on Parallel and Distributed Systems, Vol. 6. No. 8., 1995.
    57. Frederic DESPREZ, et al., "Reconfipuration Versus Static Network in Matrix Multiplication and Matrix Transpose Algoruthms", Parallel and Distributed Processing, 1991.
    58. George A. Geist, et al., "LU Factorization Algorithms on Distributed-Memory Multiprocessor Architectures", SIAM Journl on Scientific and Statistical Computing, Vol. 9, No. 2. July 1988.
    59. Gruia-Catalin Roman, et al., "'Dynamic Synchrony Among Atomic Actions", IEEE Trans. on Parallel and Distributed Systems, Vol. 4, No. 6, June 1993.
    60. Hakon O. Bugge, et al., "Trace-Driven Simulations for a Two-Level Cache Design in Open Bus Systems", 1991.
    61. Harry A. G. Wijshoff and Jan Van Leeuwen, "On Linear Skewing Schemes and d-Ordered Vectors", IEEE Trans. on Computers, Vol. c-36, No. 2, Feb 1987.
    62. HarD, A. G. Wijshoff and Jan Van Leeuwen, "The Structure of Periodic Storage Schemes for Parallel Memories", IEEE Trans. on Computers, Vol. c-34, No. 6, June 1985.
    63. Harry A. G. Wijshoff, "Data Orgnization in Parallel Computers", Kluwer Academic Publishers, 1989.
    64. Humayun Khalid, "A New Cache Replacement Scheme Based on Backpropagation Neural Networks", Computer Architecture News, Vol. 25, No. 1, 1996.
    65. Intel Coporation, "8237IFB PCI ISA IDE Xcelerator(PIIX)", 1994.
    66. Intel Coporation, "82430FX PCISET Electrical Mecanical Specifications: 82437FX Triton System Controller(TSC), and 82438FX Triton Data Path Unit(TDP)", 1994.
    67. Intel Corporation, "'Microprocessors", 1995.
    68. Intel Corporation, "Pentium~(TM) Processors and Related Products", 1995.
    69. J. D. Allen and D. E. Schimmel, "Issues in the Design of High Performance SIMD Architecture", 1EEE Trans. on Parallel and Distributed Systems, Vol. 7. No. 8., 1996.
    70. J. F. JaJa and K. W. Ryu, "The Blocked Ditributed Memory Model", IEEE Trans. on Parallel and Distributed Systems, Vol. 7. No. 8., 1996.
    71. J. M. Frailong, etc., "XOR-Schemes: A flexible Data Organization in Parallel Memories', In the Proceedings of the 1985 International Conference on Parallel Processing, pp. 276-285, Aug. 1985.
    72. J. W. Park, "An Efficient memory System for Image Processing", Dept. of CS and Statistics, College of Science, South Korea.
    73. James M. Feldman and Charles T. Retter, "Computer Architecture A Design Text Based on a Generic RISC", McGraw-Hill, Inc., 1994.
    74. Jesse Zhixi Fang and Mi Lu, "An Iteration Approch for Cache or Local Memory Thrashing on Parallel Processing", IEEE Trans. on Computers, Vol. 42, No. 5, May 1993.
    75. Jim Handy, "The Cache Memory Book", Academic Press, Inc., 1993.
    76. John L. Hennessy and David A. Patterson, " Computer Architecture: A Quantitative Approach", Morgan Kaufmann Publishers, Inc., 1991.
    77. Jrle Berntsen, "Communication Efficient Matrix Multiplication on Hypercubes", Parallel Computing, 12(1989).
    78. Jung-Shyr Wu and Fang-Jang Kuo, "Traffic Management Circuit for the Shared Buffer Memory Switch with Multicasting", Computer. Communication, Vol. 16, No. 11, 1993.
    79. Kai Huang, "Advanced Computer Architecture: Parallelism, Scalability, and Programmablity", McGraw-Hall, Inc., 1993.
    80. Karin Petersen and Kai li, "An Evaluation of Multiprocessor Cache Coherence Based on Virtual Memory Support", IPPS, 1994.
    81. K. Kim and V. K. Kumar, "Parallel Memory System for Image Processing", In Proc. of the 1989 Conference on Compuetr Vision and Pattern Recognition(CVPR'89), 1989, pp. 654-659.
    81a. Ken Kennedy, et al., "Interactive Parallel Programming Using the ParaScope Editor", IEEE Trans. on Parallel and Distributed Systems, Vol. 2, No. 3, July 1991.
    82. K. Batcher, "the Multidimentional Access Memory in STARAN", IEEE Trans. on Computers, C-26(2), pp. 174-177(Feb., 1977).
    83. Luigi Brochard, "Efficiency of Some Parallel Numerical Algorithms on Distributed Systems", Parallel Computing 12(1989).
    84. M. Brorsson, et al., "Visualization of Cache Coherence Bottlenecks in Shared-Memory Multiprocessor Application", IEEE Computer Society Technical Committee on Computer Architecture Newsletter, Fall 1993.

    85. M. Satyanarayanan, et al., "Lightweight Recoverable Virtual Memory", ACM Trans, on Computer Systems, Vol. 12, No. 1, Feb. 1994.

    86. Manish Gupta, et al., "Demonstration of Automatic Data Patitioning Techniques for Parallelizing Compilers on Multicomputers", IEEE Trans, on Parallel and istributed Systems, Vol. 3, No. 2, March 1993.

    87. Mark A. Nichols, etc. "Data Management and Control-Flow Aspects of an SIMD/SOMD Parallel Language/Colpiler", IEEE Trans, on Parallel and Distributed Systems, Vol. 4 No. 2, Feb. 1993.

    88. Mark D. Hill, "A Case for Direct-Mapped Caches", Computer, Dec. 1988.

    89. Mauricio J. Serrano, et al., "Optimal Architectures and Algorithms for Mesh-Connected Parallel Computers with Separable Row/Column Buses", IEEE Trans, on Parallel and Distributed Systems, Vol. 4. No. 10, Oct. 1993.

    90. Meera Balakrishnan, R. Jain, and C. S. Raghavendra, "on Array Storage for Conflict-free Memory Access for Parallel Processors", ICPP, 1988.

    91. Michael J. Quinn, "Parallel Sorting Algorithms for Tightly Coupled Multiprocessors", Parallel Computing, 6(1988).

    92. Michel Dubois, "Effects of Cache Coherence in Multiprocessors", IEEE Trans, on Computers, Vol. c-31, No. 11, Nov. 1982.

    93. Monica S. Lam etc., "The Cache Performance and Optimizations of Blocked Algorithms", the Fourth Inter, conf. on Arch. Support for Program Languages and Operating Systems, 1991.

    94. Monica S. lam, Edward E. Rothberg and Michael E. Wolf, "The cache performance and optimizations of blocked algorithms", CSL, Stanford University, Ca 94305

    95. Mounir Marrakchi, et al., "Optomal Algorithms for Gaussian Elimination on MIMD Computer", Parallel Computing, 12(1989).

    96. N. Drach, et al., "Semi-Unified Caches: Increasing Associativity of On-Chip Caches", IEEE Computer Society Technical Committee on Computer Architecture Newsletter, Fall 1993.

    97. Narain H. Gehani, "Capsules: A Shared Memory Access Mechanism for Concurrent C/C++", IEEE Trans, on Parallel and Distributed Systems, Vol. 4, No. 7, July 1993.

    98. P. Dudnik and D. J. Kuck, "The Orgnization and Use of Parallel Memories", IEEE Trans, on Computers, C-20( 12), pp. 1566-1569(Dec. 1971).
    99. Qing Yang, "Introducing A New Cache Design into Vector Computers", Dept. of Electronical Engineering, The University of Rhole Island Kingston, RI, USA 02881.
    100. Qing Yang, et al., "Analysis and Comparison of Cache Coherence Protocals for a PacketSwitched Multiprocessor", IEEE Trans. on Computers, Vol. 38, No. 8, Aug. 1989.
    101. Quentin E. Dolecek, "High-Speed Bus Structures for Multiprocessing", 1985.
    102. Rajendra V. Boppana and C. S. Raghavendra, "Efficient Storage Schemes for Arbiytray Size Square Matrices in Parallel Processors with Shuffle-Exchange Networks", ICPP, 1991.
    103. Robert T. Short et al., "A Simulation Study of Two-Level Caches", The 15th Annual International Symposium on Computer Architecture, 1988.
    104. Roger D. Hersch, "Parallel Storage and Retrieveal of Pixmap Images", 20th IEEE Symposium on Mass Storage Systems", 1993.
    105. Rumen Lliev, et al., "Distributed Shared Memory as a Global Communication Space in a Parallel Architecture", Parallel and Distributed Processing, 1991.
    106. S. Belayneh and D. R. Kaeli, "A Discussion on Non-Blocking/Lockup-Free Caches", Computer Architecture News, Vol. 24, No. 3, 1996.
    107. S. Lennaert Johnsson, "Minimizing the Communication Time for Matrix Multiplication on Multiprocessors", Parallel Computing, 19(1993).
    108. Saman P. Amarasinghe, et al., "Communication Optimization and Code Generation for Distributed Memory Machines", Proc. of the SIGPLAN'93 Conference on Programming Language Design and Implementation, June, 1993.
    109. Seema Hiranandani, et al., "Compiler Optimizations for Fortran D on MIMD Distributd-Memory Machines", Proc. of Supercomputing, ACM, 1991.
    110. Shlomo Weiss and Baltimore, "An Aperodic Storage Scheme to Reduce Memory Conflicts in Vector Processors", ISCA, 1989.
    112. Silvia Christova, "Multicast Communication in Multitransputer System", Parallel and Distributed Processing, 1991.
    113. Steve Carr and Ken Kennedy, "Blocking Linear Algebra Codes for Memory Hierarchies", Proc. of 4th SIAM Conf. on Parallel Processing for Scientific Computing, Chicago, IL, Dec. 1989.
    114. Steven Przybyiski, et al., "'Characteristics of Performance-Optional Multi-Level Cache Hierarchies", Proc. 16th Ann. Int'l Syrup. on Computer Architecture, California, 1989.
    115. Steven Przybylski, et al., "Performance Tradeoffs in Cache Design", The 15th Inter. Sym. on Comp. Arch., IEEE, 1988.
    116. T. L. Freeman, "Calculating Polynominal Zeros on a Local Memory Parallel Computer", Parallel Computing, 12(1989).
    117. Thomas J. LeBlanc, et al., "Large-Scale Parallel Programming: Experience with the BBN Butterfly Parallel Processor", ACM/SIGPLAN Notices, No. 23, Sep., 1988.
    118. Walter F. Tichy, "Parallel Matrix Multiplication on the Connection Machine", Nov. 1988.
    119. Warren Harrison, "Tools for Multiple-CPU Environments", IEEE, May, 1990.
    120. Wen-Hann Wang et al., "Organization and Performance of a Two-Level Virtual-Real Cache Hierarchy", ACM, 1989.
    121. Wilfried Oed and Otto Lange, "On the Effective Bandwidth of Interleaved Memories in Vector Processor Systems", IEEE Trans. on Computers, Vol. C-34, No. 10, 1985.
    122. William Pugh, "Apractical Algorithm for Exact Array Dependence Analysis", Communication of the ACM, Vol. 35, No. 8, Aug., 1992.
    123. Xiaobo Li, "Parallel Algorithms for Hierachical Clustering and Cluster Validity", IEEE Trans. on Pattern and Mach. Intell. 12(11), pp. 1088-1092(Nov. 1990).
    123a. Xiaohua Jia, "A Parallel and Nonbiocking Updating Mechanism for Replicated Directory Files in Distributed Systems", Journal of Parallel and Distributed Computing, 20, 1994.
    124. Xiaola Lin, et al., "Multicast Communication in Multicomputer Networks", IEEE Trans. on Parallel and Distributed Systems, Vol. 4. No. 10, Oct. 1993.
    125. Zarka Cvetanovic, "The Effects of Problem Patitioning, Allocation, and ranularity on the Performance of Multiple-Processor Systems", IEEE Trans. on Computers, Vol. C-36, No. 4, April 1987.
    126. Zhiyong Liu and Xiaobo Li, "A Parallel Storage Scheme and its Implementation Issues", In Proc. of the Sixth International Parallel Processing Symposium(IPPS'92), March, 1992, pp. 550-557.
    127. Zhiyong Liu and Xiaobo Li, "Routing Linear Complement Permutations on Synchronous Hypercubes", Tech. Report 92-01, Dept. of Computing Science, University of Alberta, Edmonton, Alberta, Canada T6G 2H1, 1992.
    128. Zhiyong Liu and Xiaobo Li, "XOR Storage Schemes for Frequently Used Data Patterns", Journal of Parallel and Distributed Computing 25, 162-173, 1995.
    129. Zhiyong Liu, Xiaobo Li, and Jiahua You; "On Storage Schemes for Parallel Array access", In Proc. of 1992 ACM International Conference on Supercomputing, 1992, pp. 22-291.
    130. Zhiyong Liu, Xiaobo Li and Jia-Huai You, "An Adaptive parallel Memory System for Image Processing", Vission Interface, 1992.
    131. Zhiyuan Li, "Software Assistance for Directory-Based Caches", IPPS, 1994.
    George S. Almasi and Allan Gottlieb, "Highly Parallel Compting", The Benjiamin/Cummings Publishing Company, Inc., 1994.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700