可扩展的单一映象文件系统
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
传统的分布式文件系统不能为机群系统提供严格的单一映象功能,而且由于不能适应计算技术的发展趋势,无法满足应用对机群系统的I/O性能、可扩展性和可用性的需求。曙光超级服务器是典型的机群系统,我们为其研制开发了可扩展的单一映象文件系统COSMOS,并称其原型系统为S2FS。本文主要描述了S2FS的设计、实现及评价。
     首先,S2FS是一个全局文件系统,它通过实现位置透明性和严格的UNIX文件共享语义而保证了严格的单一系统映象。我们在不修改AIX操作系统源码的前提下,通过Vnode/VFS层核心扩充,实现了与其底层平台的无缝连接,保证了与UNIX应用程序的完全二进制兼容,验证了虚拟文件系统机制是实现这一目标的一种有效途径。
     其次,为了提高S2FS系统的性能和可扩展性,本文对合作式缓存进行了研究和评价。在避免系统死锁的前提下,设计了基于目录的无效使能协议,并证明其保证了缓存一致性。为进一步提高系统性能,提出了双粒度缓存一致性协议,在此基础上设计了启发式缓存管理算法,通过模型分析证明其同目前常用的N-Chance算法相比有了进一步的性能改进。
     最后,为了避免单一服务器瓶颈问题,我们为S2FS采用数据存储与元数据管理分开的策略,实现了分布式的数据存储和元数据管理功能。元数据管理服务器除了存储及维护系统元数据(如文件索引节点和超级块)外,还记录了数据缓存位置,并维护合作式缓存的一致性。在存储服务器一端,实现了网络磁盘存储分组功能及软件RAID1模型,底层存储基于可靠的JFS和异步I/O功能,提高了I/O带宽和存储的可用性。
     虽然本文在保证系统单一映象和二进制兼容性的基础上,对适合于机群文件系统的可扩展性技术进行了研究,但由于应用对I/O的需求是永无止境的,且其I/O存取特征以及计算技术的发展趋势也在不断发生变化,这一切都为我们未来研制新型的分布式文件系统提出了更大的挑战。
Traditional distributed file systems can't provide clusters with strict single-system image, and because of failing to keep up with the trends in computing technology, they can't meet the cluster applications' requirements either, such as I/O performance, scalability, and availability. Dawning super-server is a typical cluster system, we have developed COSMOS file system for it, and call its prototype file system S2FS, an acronym for a Scalable Single-image File System. Mainly presented in this dissertation are S2FS's design, implementation, and evaluation.
    First. S2FS is a global file system. In order to maintain a strict single-system image, it provides location transparency and strong UNIX file-sharing semantics. Being lack of AIX operating system's source code, we can still add S2FS into AIX seamlessly at the Vnode/VFS interface so mat S2FS maintains ABI/API compliance with UNIX file system, thus demonstrating that Virtual File System is an effective mechanism to achieve this objective.
    Further, this dissertation highlights the research and evaluation of cooperative caching used to improve S2FS's performance and scalability. After a sufficient condition of the deadlock-free design has been given, the directory-based invalidate cache coherence protocol is introduced and its cache coherence is verified using belief. Then we propose the dual-granularity cache coherence protocol as a way to further improve the system performance, and devise a hint-based heuristic cooperative caching algorithm under dual-granularity protocol. The analytical models are established for both heuristic algorithm and the state-of-the-art N-Chance algorithm, the analytical results show that the heuristic algorithm can effectively reduce the I/O response time compared with N-chance algorithm almost in each case.
    Finally, in order to eliminate central file server bottleneck found in traditional file systems. S2FS splits the traditional server's functionality into two separate pieces: data storage and metadata management, and distributes them among cooperating networked machines respectively. The metadata management server, which we call manager, is responsible for storing and maintaining system metadata(including file inodes and superblook), and it also records the data location in the clients' caches so as to preserve cooperative cache coherence. The storage server implements network disk stripping
引文
[Ale] A.D. Alexandrov, M. Ibel, K.E. Schauster and C.J. Scheiman. Extending the Operating System at the User Level:the Ufo Global File System. Available on line under http://www.cs.ucsb.edu/ufo.
    [And-1] T. E. Anderson, et al. A case for NOW. IEEE Micro, Vol.15, No.1, Februry 1995, PP:54-64.
    [And-2] T.E. Anderson, et al. Serverless Network File Systems. ACM Transactions on Computer Systems, Vol. 14, No. 1, February 1996. PP:41-79.
    [Bak] M.G. Baker, et al. Measurements of a Distributed File System. In Proceedings of the 13th ACM Symposium on Operating Systems Principles, 1991. PP:198-212.
    [Bla] M.A. Blaze. Caching in Large-Scale Distributed File Systems. Ph.D dissertation. Department of Computer Science, Princeton University, Jannuary 1993.
    [Bol] N. Bolden, et al. Myrinet: A Gigabit-per-second Local-Area Network. IEEE Micro, Vol. 15, No. 1, Februry 1995. PP:29-36.
    [Bur-1] M. Burrows, M. Abadi and R. Needham. A Logic of Authentication. Technical Report 39, DEC Systems Research Center, February 1989.
    [Bur-2] M. Burrows, M. Abadi and R. Needham. A Logic of Authentication. ACM Transactions on Computer Systems, Vol.8, No. 1, February 1990. PP: 18-36.
    [Cao-1] P. Cao. Application-Controlled File Caching and Prefetching. Ph.D dissertation. Department of Computer Science,Princeton University, January 1996.
    [Cao-2] P. Cao, E.W. Felten, A.R. Karlin and K. Li. A Study of Integrated Prefetching and Caching Strategies. In Proceedings of ACM SIGMETRICS, May 1995. PP:188- 197.
    [Cao-3] P. Cao, E.W. Felten, A.R. Karlin and K. Li. Implementation and Performance of Integrated Application-Controlled File Caching, Prefetching, and Disk Scheduling. ACM Transactions on Computer Syetms, VOL. 14, No.4, November 1996. PP:311- 343.
    [Cha] A. Chang, M.F. Mergen, et al. Evolution of storage facilities in AIX Version 3 for RISC System/6000 processors. IBM Journal of Research and Development, Vol.34. No. 1, JAN 1990. PP: 105-110.
    [Che-P] P. Chen, E. Lee, G. Gibson, R. Katz, and D. Patterson. RAID: High-Performance Relible Secondary Storage. ACM Computing Surveys, Vol.26, No. 2, 1994. PP: 145-188.
    [Che-S] S, Chen, D, Towsleg, A Performance Evaluation of RAID Architectures. IEEE Transactions on Computers, Vol.45, No. 10, October 1996. PP:1116-1130.
    [Dah-1] M.D. Dahlin, et al. Cooperative Caching:Using Remote Client Memory to Improve File System Performance. In Proceedings of the Ist Symposium on Operating Systems Design and Implementation, November 1994. PP:276-280.
    [Dah-2] M.D. Dahlin, C.J. Mather, R.Y. Wang, T.E. Anderson and D.A. Patterson. A Quantitative Analysis of Cache Policies for Scalable Network File Systems. In Proceedings of the Twentieth International Symposium on Computer Architecture. November 1994. PP:2-13,
    [Dah-3] M D. Dahlin. The Impact of Trends in Technology on File System Design. Available online under http://www.cs.utexas.edu/users/dahlin/techTrends.
    [Dev] M. Devarakonda, A. Mohindra, J. Simoneaux and W. H. Tetzlatt. Evaluation of Design Alternatives for a Cluster File System. In Proceedings of 1995 USENIX Technical Conference, January 1995(New Orleans, LA). PP:35-46.
    [Dub] R. Dube, et al. Improving NFS Performance Over Wireless Links. IEEE Transactions on Computers, Vol.46, No.3, March 1997. PP:290-298.
    [Fee] M.J. Feeley, W.E. Morgan, et al. Implementing Global Memory Management in a Workstation Cluster. In Proceedings of the 15th Symposium on Operating System Principles, December 1995. PP:201-212.
    [Flo] R. Floyd. Short-term File Reference Patterns in a UNIX Environment. Technical Report: TR-177. Department of Computer Science, University of Rochester, 1986.
    [Gem] D.J. Gemmell, et al. Multimedia Storage Servers:A Tutorial and Survey. Technical Report of University of Texas at Austin, available on line under http://www.cs.utexas.edu/users/dmcl.
    [Gil] D.S. Gill, S. Zhou, and H. S. Sandha. A Case Study of File System Workload in a Large-Scale Distributed Environment. Technical Report: CSRI-296, Computer Systems Research Institute, University of Toronto. March 1994.
    [Gol] A. Goldstein. The Design and Implementation of a Distributed File System. Digital Technical Journal, (5), September 1987.
    [Har] J.H. Hartman and J.K. Ousterhout. The Zebra Striped Network File System. ACM Transactions on Computer Systems, Vol. 13, No.3, August 1995. PP:274-310.
    [Hen] J. Henneyssy and D. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kanfmann Publishers Inc. 2nd edition, 1996.
    [How] J. H. Howard. An Overview of the Andrew File System. In Proceedings of USENIX Winter Conference, February 1988(Dallas, Texas). PP:23-26.
    [Hu-1] W. Hu, W. Shi, Z. Tang and M. Li. A Lock-Based Cache Coherence Protocol for Scope Consistency. Journal of Computer Science & Technology, Vol.13, No.2, March 1998. PP:97-109,
    [Hu-2] 胡伟武.共享存储系统中的访存事件次序.博士学位论文,中国科学院计算技术研究所.1996年1月.
    [Hwa] K.Hwang and Z. Xu. Scalable Parallel Computing: Technology, Architecture, Programming. WCB/McGraw-Hill Inc, 1998.
    [IBM] IBM Corporation. AIX Version 4.1 General Programming Concepts: Writing and Debugging Programs.
    [Jai] Edited by R. Jain, et al. Input/Output in Parallel and Distributed Computer Systems. Kluwer Academic Publishers, 1996.
    [Kaz] M. L. Kazar. Synchronization and Caching Issues in the Andrew File System. In Proceedings of USENIX Winter Conference, February 1988(Dallas, Texas). PP:27- 36.
    [Kha] Y.A. Khalidi, J.M. Bernabeu, V. Matena, K. Shirriff, M. Thadani, 'Solaris MC: A Multi Computer OS', available online under http://www.sunlabs.com/research/solaris-mc.
    [Kim] T. Kimbrel, P. Cao, E.W. Felten, A,R. Karlin and K. Li. Integrated Parallel Prefetching and Caching. In Proceedings of 1996 ACM SIGMETRICS, May 1996(Philadelphia, PA, USA). PP:262-263.
    [Lee] E.K. Lee and C.A. Thekkath. Petal:Distributed Virtual Disks. In Proceedings of the 1996 International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 1996(MA, USA), ACM PRESS. PP:,84-92.
    [Lef] A. Leff, J.L. Wolf and P.S. Yu. Replication Algorithms in a Remote Caching Architecture. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, Vol.4, No.11,November 1993. PP:1185-1204.
    [Len] D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta and J. Hennessy. The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor, In Proceedings of the 17th International Symposium on Computer Architecture, May 1990. PP:148-159.
    [Lev] E. Levy and A. Silberschatz. Distributed File Systems: Concepts and Examples. ACM Computing Surveys, Vol.22, No.4, December 1990. PP:321-374.
    [Maj] A.H. Majidimehr. Optimizing UNIX for performance. Prentice Hall Inc., 1996.
    [Mat] J. N. Matthews, et al. Improving the Performance of Log-structured File Systems with Adaptive Methods. In Proceedings of the 16th ACM Symposium on Operating Systems Principles, October 1997(Sant Malo, France). PP:238-251.
    [Mor] J.N. Morris, M. Satyanarayanan, et al. ANDREW: A Distributed Personal Computing Environment. Communication of ACM, Vol.29, No.3, March 1986. PP: 184-194.
    [Mul] Edited by S. Mullender. Distributed Systems. ACM PRESS, 1989.
    [Mum-1] L. B. Mummert. Exploiting Weak Connectivity in a Distributed File System. Ph.D dissertation. Computer Science Division, School of Computer Science, Carnegie Mellon University, December 1996.
    [Mum-2] L.B. Mummert, J.M. Wing and M. Satyanarayanan. Using Belief to Reason About Cache Coherence. CMU Technical Report:CMU-CS-94-151.
    [Mum-3] L. B. Mummert and M. satyanarayanan. Variable Granularity Cache Coherence. ACM Operating Systems Review, January 1994, Vol.28, No. 1. PP:55-60.
    [Mum-4] L. B. Mummert, et al. Using Belief to Reason About Cache Coherence. In Proceedings of the Symposium on Principles of Distributed Computing, August 1994.
    [Nel] M.N. Nelson, et al. Caching in the Sprite Network File System. ACM Transactions on Computer Systems. Vol.6, No. 1, February 1988. PP:134-154.
    [Ous] J. K. Ousterhout, et al. A Trace-Driven Analysis of the UNIX 4.2 BSD File System. In Proceedings of the 10th ACM Symposium on Operating Systems Principles. New York, December 1985. PP:15-24.
    [Paw] B. Pawlowski, et al. NFS verson 3 Design and Implementation. in Proceedings of the 1994 Summer USENIX Conference, USENIX Association, Boston, CA, June 1994. PP:137-151.
    [Pfi] G. F. Pfister. Clusters of Computers for Commercial Processing:the invisible architecture. IEEE Parallel & Distributed Technology, Vol.4, No.3, Fall 1996. PP:12-14.
    [Rie] E. Riedel, C. V. Ingen, J. Gray. Sequential I/O on Windows NT~(tm) 4.0—Achieving Top Performance. Available on line under http://www, research. microsoft.com/barc.
    [Ros-D] D.S.H. Rosenthal. Evolving the Vnode Interface. In Proceedings of USENIX Summer Conference, June 1990(Anaheim California). PP:107-117.
    [Ros-M] M. Rosenblum, J. K. Ousterhout. the LFS Storage Manager. In Proceedings of USENIX Summer Conference, June 1990(Anaheim California). PP:315-324.
    [San] R. Sandberg. The Sun Network File System: Design, Implementation and Experience. In Proceedings of USENIX Summer Conference, Summer 1987. University of California Press. PP:300-313.
    [Sar] P. Sarkar and J. Hartman. Efficient Cooperative Caching using Hints. In Proceedings of USENIX 2th Symposium on Operating System Design and Implementation, 1996. PP:35-46.
    [Sat-1] M. Satyanarayanan. Scalable, Secure, and Highly Available Distributed File Access. IEEE Computer. Vol. 23, No. 5, May 1990. PP: 9-21.
    [Sat-2] M. Satyanarayanan and M. Spasojevic. AFS and the Web:Competitors or Collaborators?. ACM Operating System Overview, Vol.31, No.1, January 1997. PP:18-23.
    [Sim] R.T. Simoni. Cache Coherence Directories for Scalable Multiprocessors. Ph.D Dissertation. Department of Electrical Engineering at Stanford University, March 1995.
    [Sin] P. K. Sinha. Distributed Operating System: Concepts and Design. IEEE PRESS, 1997.
    [Sin] M. Spasojeuic and M. Satyanarayanan. An Empirical Study of a Wide-Area Distributed File System. ACM Transactions on Computer Systems, Vol.14, No.2, May 1996. PP:200-222.
    [Sta-C] C. Staelin. File Access Patterns. Technical Report: CS-TR-179-88. Department of Computer Science. Princeton University, 1988.
    [Sta-T] T. Stabell-Kul. Security and Log Structured File Systems. ACM Operating Systems Review, Vol.31, No.2, April 1997. PP:9-10.
    [Ste] D.C. Steere, et al. Efficient User-Level File Cache Management on the Sun Vnode Interlace. In Proceedings of USENIX Summer Conference, June 1990(Anaheim, California). PP:325-331.
    [Sto] M.T. Stolarchuk. Faster AFS. In Proceedings of USENIX Winter Conference, January 1993(San Diego, CA). PP:67-75.
    [Tan] A.S. Tanenbaum. Computer Networks. Third edition,Prentice hall Inc, 1996.
    [Tew-1] K. Tewari, et al. Beyond Hierarchies:Design Considerations for Distributed Caching on the Internet. UTCS Technical Report:TR-98-04.
    [Tew-2] K. Tewari, et al. Resource Based Caching for Web Servers. Available on line under http://www.cs.utexas.edu/users/dmcl.
    [The] C.A. Thekkath. Frangipani: A Scalable Distributed File System. In proceedings of the ACM 16th SOSP, October 1997(Saint-Malo, France). PP:224-237.
    [Tri] P. Triantafillou and C. Neilson. Achieving Strong Consistency in a Distributed File System. IEEE Transactions on Software Engineering, Vol.23, No.1, January 1997. PP:35-55.
    [Vah] U. Vahalia. UNIX Internals:the New Frontiers. Prentice-Hall International Inc., 1996.
    [Vaz] M. Vaziri-Farahani. Model Checking Cache Coherence Protocols for Distributed File Systems. CMU Technical Report:CMU-CS-95-156.
    [Wil] J. Wilkes, R. Golding, C. Staelin and T. Sulliran. The HP AutoRAID Hierarchical Storage System. ACM Transactions on Computer Systems, VOL.14, No.1, February 1996. PP:108-136.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700