一种Linux分布式存储系统的设计和实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
由于集成电路技术的快速发展,计算机的计算能力呈指数增长,但磁盘等存储设备的I/O速度增长缓慢,处理器与I/O在性能上的差异造成了严重的瓶颈问题。与此同时,随着当今时代信息科技的飞速发展,数据存储的规模逐步达到TB和PB。单纯的依靠磁盘和磁带已很难满足人们的需要。
     针对上述问题,设计一种具有大容量、高可靠性、高可用性、高性能、动态可扩展性、易维护性的文件存储系统越来越成为需要。通过研究当前广泛使用的两种网络存储结构(NAS和SAN),提出了一种既具备NAS和SAN系统技术优点、又能克服两者缺点的基于Linux的分布式存储系统。
     每个存储节点都存储一部分数据文件及对应的元数据。当存储节点启动后,与其在同一个局域网的存储节点通信,最后获得全局的元数据信息。各个存储节点相互独立,提高了系统的并行性和可扩展性。对于热点文件,借鉴于RAID思想,将连续的数据分割成相同大小的数据块,将每段数据分别写入不同的存储节点,这样可以采取并行访问的策略来提高该文件的访问速度。为了使该存储系统使用方便,在应用服务器端设计实现了LDFS(Linux Distribute File System)文件系统,当其挂载后,可以像普通的文件系统一样使用。
     测试表明,在相同的网络环境下对同等大小的数据进行访问,Linux分布式存储系统的I/O性能均优于CIFS和SMB文件系统的I/O性能。
Because of the speedy development of integrated circuits , the computer's computation capability takes exponential growth. But the low increase speed of storage devices' (like a disk) I/O, the CPU’s computation capability can't match with it ending a bottleneck. At the same time, with the rapid development of nowadays information technology, the size of data storage is gradually up to TB or PB. It's hard to satisfy user needs by purely depend on disk and tape.
     According to the problem above, it’s necessary to design a kind of file storage system which with large volume, high reliability and usability, perfect performance, dynamic extension and easy to maintenance. Through research on NAS and SAN which are two kinds of network storage structure, it puts forward a distributed storage system that bases on soft-RAID technology, this system not only has the merits of NAS and SAN, but also can overcomes their shortcomings.
     Every storage node stores part of data file and the matching metadata. When a storage node starts up, it will communicate with any other node in the same local area network, and finally gain all the metadata information. As the individual storage node is independent, it improves the system's parallelism and expansibility.
     Towards hotspot file, referencing the thought of RAID, we can divide a consecutive stream of data into a series of data blocks, the blocks all being of the same size, and write each segment data into different storage node, so the file access speed will be boosted when takes a parallel-access method.
     For the facility to use the storage system, LDFS file system is designed and realized on the application server, you can operate the system like an ordinary one when it's mounted.
     Test indicates that the I/O performance of Linux distributed storage system is much better than Samba file system and CIFS, when they access data that in equal size in the same network environment.
引文
[1] Artecon Corp, Technical Brief. SAN, NAS, and Direct-Attached Storage: What’s Right for Your Network. ACM Transactions on Computer Systems, 2000, 18(2): 7~15
    [2]谭志虎,裴先登,谢长生等.附网存储-一种新的网络存储方案.电子计算机和外部设备,1999,23(1): 3~6
    [3]董晓明,谢长生.基于对象的进化存储系统研究.计算机科学, 2005, 32(11): 223~226
    [4]李晓钰.基于存储网络的备份系统的研究与实现: [硕士学位论文].武汉:华中科技大学, 2003.
    [5] D. Nagle, G. Ganger, J. Butler, et al. Network Support for Network-Atached Storage. Proceedings of Hot Interconnects, 1999, 24(2): 45~51
    [6] E. Riedel, G. Gibson. Active disks--Remote execution for network-attached storage. Technical Report CMU-CS-97-198, 1997, 19(8): 26~34
    [7] Dave Anderson. Network Attached storage is inevitable. Network Computing and Applications, 2001, 30(5): 194~201
    [8] H. Yokota. Performance and reliability of secondary storage systems. Proceedings Of World Multiconference on Systemics, 2000, 18(3): 668~673
    [9] Steven Wilson. Managing a Fibre Channel Storage Area Network. Storage Network Management Working Group for Fibre Channel (SNMWG-FC), 1998, 31(3): 68~76
    [10] Garth A. Gibson, Rodney Van Meter. Network Attached Storage Architecture. Communications of the ACM, 2000, 43(11): 81~89
    [11] Tom Clark, Thomas Clark. Designing Storage Area Networks: A Practical Reference for Implementing Fibre Channel SANs. Addison-Wesley, 1999, 31(3): 55~62
    [12] Simitci. H, Malakapalli. C, Gunturu. D. Evaluation of SCSP over TCP/IP and SCSI over fibre channel connections. IEEE Spectrum, 2001, 27(4): 87~91
    [13] Charles Monia, Rod Mullendore. iFCP一A Protocol for Internet Fibre Channel Storage Networking. IEEE Communications Magazine, 2005, 43(3): 86~93
    [14] Adam Sweeney, Doug Doucette, Wei Hu, et al. Scalability in the XFS File System. Proceedings of the USENIX, 1996, 34(5): 342~350
    [15]杨波,朱秋萍. Web安全技术综述.计算机应用研究. 2002, Vol. 23(10): 1~4
    [16] Rajesh Bordawekar, Steven Landherr, Don Capps, et al. Experimental evaluation of the Hewlett-Packard Exemplar file system. ACM SIGMETRICS Performance Evaluation Review, 1997, 25(3): 21~28
    [17] Jiwu Shu, Bigang Li, Weimin Zheng. General Parallel File System for Linux. IEEE Transactions on Computers, 2005, 54(4): 439~448
    [18] Jeff Ballard. NFS: hunting for a cross-platform file system. Network Computing, 1998, 9(12): 101~104
    [19] M. Satyanarayanan. The evolution of Coda. ACM Transactions on Computer Systems (TOCS), 2002, 20(2): 85~124
    [20] Thomas E. Anderson, Michael D. Dahlin, Jeanna M. Neefe, et al. Serverless Celerra HighRoad network file systems. ACM Transactions on Computer Systems, 1996, 14(1): 41~79
    [21] Sharad Garg, Jens Mache. Performance Evaluation of Parallel File Systems for PC Clusters and ASCI Red. Proceedings of the 2001 IEEE International on Cluster Computing, 2002, 27(2): 68~77
    [22] Jaechun No, Rajeev Thakur, Alok Choudhary. High-performance scientific data management system Galley. Journal of Parallel and Distributed Computing, 2003, 63(4): 434~447
    [23] Ibrahim F. Haddad. PVFS: A Parallel Virtual File System for Linux Clusters. Linux Journal, 2000, 200(80): 5~12
    [24]杨德志,张建刚,许鲁.大容量、高性能、高扩展能力的蓝鲸分布式文件系统.计算机研究与发展, 2005, 42(6): 189~194
    [25]鲁宏伟,李锐. NAS集群中的单一系统映像关键技术.计算机应用研究, 2003, 14(1): 108~111
    [26]朱旋,郑纬民,汪东升.单一系统映象在机群管理中的实现.计算机工程与应用, 2002, 15(3): 86~89
    [27]李长和,施亮,吴智铭.基于Linux的NAS系统的设计与研究.计算机工程,2002,26(7): 198~199
    [28] Chandramohan A, Thekkath K, Timothy Mann, et al. Frangipani: A Scalable Distributed Parallel File System. Proceedings of the Symposium on Operating Systems Principles, 1997, 68(12): 224~237
    [29]段剑弓,存储系统NAS和SAN的差异和统一.计算机应用研究, 2004, 67(12): 94~104
    [30]倪永军,谢长生.网络存储技术现状、存在的问题及对策研究.计算机工程与应用, 2003, 39(10): 159~162
    [31]康磊,白英彩. NAS和SAN融合趋势研究.计算机应用与软件, 2004, 21(5) : 69~71
    [32] Tom Clark, Thomas Clark. Designing Storage Area Networks: A Practical Reference for Implementing Fibre Channel SANs. Addison-Wesley, 1999. pp. 137~168
    [33]庞丽萍,何飞跃,岳建辉等.并行文件系统集中式元数据管理高可用系统设计.计算机工程与科学, 2004, 26(11): 87~91
    [34] Hartman, J. H. and Ousterhout, J. K. , The Zebra Striped Network File System, ACM SOSP, Dec 1993, 139~144
    [35]万继光.多功能附网存储服务器的研究与实现: [硕士学位论文]。武汉:华中科技大学, 2003.
    [36]顾健,余胜生,周敬利等.集成多媒体文件系统的建模与实现.计算机工程, 2003, 29(4): 58~62
    [37] Long. D. D. E. Swift/RAID: A Distributed RAID System. Computing Systems, 1994, 7(3): 333~359
    [38]杨益,郭庆平. Linux虚拟文件系统实现技术剖析[J).交通与计算机, 20 00 (S1 ): 46~49
    [39]郭学理,韦智,·潘松. linux的Ext2文件系统[J].计算机应用研究, 20 01 (05): 128~130
    [40]郎荣玲,戴冠中. linux操作系统的文件系统建立过程的研究[J].计算机工程与应用, 2001,15: 90~92
    [41]董晓明,李怀阳,赵振等.一种创建小型化Linux系统的方法.计算机工程, 2005, 31(9): 61~63

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700