分布式文件系统高可用问题研究

英文题名：Research on High Availability of Distributed File System
作者：史小冬
论文级别：博士
学科专业名称：计算机系统结构
中文关键词：分布式文件系统 ; 高可用 ; 分布式状态恢复 ; 服务连续性 ; 分布式文件系统高可用等级
英文关键词：Distributed file system ; High availability ; Distributed state recovery ; ServiceContinuity ; DFS_HAL
学位年度：2002
导师：孙凝晖
学科代码：081201
学位授予单位：中国科学院研究生院（计算技术研究所）
论文提交日期：2002-05-01

摘要

分布式文件系统高可用问题是分布式文件系统的重要研究课题。目前在分布式文件系统高可用问题的研究上缺乏一种可以进行定量或定性分析的方法,许多系统只是针对某一类的应用需求(如高可用的磁盘访问)来实现,而关于某一类应用的高可用分布式文件系统应实现到什么程度以及如何使用相应的关键技术去实现高可用的分布式文件系统等问题,则缺乏相应的定性分析理论来指导。
     由于软件系统不能象硬件系统那样对可用度进行定量分析,因此本文试图从定性的角度研究分析分布式文件系统的高可用问题。本文根据分布式文件系统的应用模式,将影响分布式文件系统高可用性的因素进行聚类分析,以分布式文件系统的故障因素和恢复目标因素为线索,对分布式文件系统高可用性进行了定义,并依此建立了一个分布式文件系统高可用问题定性分析模型—DFS_HAL ( Distributed File System High Availability Level )模型。
     在建立了DFS_HAL模型后,本文将此模型和分布式文件系统的实现技术结合,通过矩阵分析的方法,研究了在不同应用需求下,高可用分布式文件系统的关键实现技术。重点研究了DFS_HAL模型中的DSR_T ( Distributed State Recovery Technology )和CFS_T( Continuous File Service )技术,提出了DSR_T技术中的中转控制策略、满足服务连续性的充分条件和CFS_T技术中的元数据操作探寻请求算法。
     作为DFS_HAL模型的一个实际应用,本文给出了曙光机群文件系统DCFS高可用系统的设计与实现技术,给出了DCFS HA系统中利用可冗余的内存记录机制来保证分布式文件系统结构一致性的方法。
     本文的主要贡献如下:
     ㈠根据分布式文件系统的应用模式和分布式文件系统的故障因素和恢复目标因素的聚类分析,首次提出了对分布式文件系统高可用问题进行定性分析的模型DFS_HAL模型。
     ㈡根据DFS_HAL模型,结合分布式文件系统的实现技术,通过矩阵分析的方法,探讨了实现分布式文件系统高可用的关键技术,从而为实现高可用的分布式文件系统起到了指导性的作用。
     ㈢提出了DSR_T技术中的中转控制策略。
     ㈣提出了满足服务连续性的充分条件和CFS_T技术中的元数据操作探寻请求算法。
     ㈤提出了通过可冗余的内存记录机制来保证分布式文件系统结构一致性的方法。
The High Availability issue is an important research topic in distributed file systems. Till now,we are short of some qualitative and quantitative analysis method on the high availability researchesof distributed file systems, and many systems are implemented only for the requires of some certainapplication (such as the high availability of Disk Access). We are now in serious need ofcorresponding qualitative theories to instruct us to resolve the following problems: In what degree ahigh available distributed file system should be implemented for some kind of application? Howshould we use the relative key techniques to implement a high available distributed file system?
     Because we cannot do the quantitative analysis for the degree of availability in software systemlike hardware system, this dissertation does some researches and analysis on the distributed filesystem’s high availability issue from a point of qualitative view. Based on the apply mode ofdistributed file system, we did some clustering analysis for the factors which will influence the highavailability issue. Based on the failure and restoring factors, we defined the high availability indistributed file system, and constructed a qualitative analytical model for high availability issue indistributed file system—DFS_HAL ( Distributed File System High Availability Level ) model.
     After constructing the DFS_HAL model, we integrated it with the implementation techniquesof distributed file system, and by adopting the matrix analytical method, we researched the keyimplementation techniques of high available distributed file system on the conditions of differentapplication requirement. We put emphases on the research of DSR_T ( Distributed State RecoveryTechnology ) and CFS_T ( Continuous File Service ) techniques in DFS_HAL model, put forwardthe transfer-controlling policy in DSR_T, sufficient conditions to satisfy service continuity, and theheuristic requesting algorithm of meta data operation in CFS_T.
     As a actual application of DFS_HAL model, we present the design and implementationtechniques of the high availability system of Dawning cluster file system-DCFS, describes methodto assure the structural consistency of distributed file system by using the redundant memoryrecording mechanism.
     The crucial contributions are as follows:
     1 Based on the apply mode, the failure and restore factors of distributed file system, for the firsttime, we put forward a qualitative analytical model for high availability issue in distributed filesystem—DFS_HAL model.
     2 Based on the DFS_HAL model, and integrated it with the implementation techniques ofdistributed file system, by adopting the matrix analytical method, we discussed the keyimplementation techniques of high available distributed file system, which works as instructions forimplementation of high available distributed file system.
     3 Put forward the transfer-controlling policy in DSR_T technique.
     4 Put forward sufficient conditions to satisfy service continuity and the heuristic requesting

引文

[AISS] A.D. Alexandrov, M. Ibel, K.E. Schauster and C.J. Scheiman. Extending the Operating System at the UserLevel:the Ufo Global File System. Available on line under http://www.cs.ucsb.edu/ufo.
    [And96] T. E. Anderson, et al. Serverless Network File Systems. ACM Transactions on Computer Systems,Vol.14, No.1, February 1996. PP:41-79.
    [Bak94]M. L. G. Baker. Fast Crash Recovery in Distributed File Systems. PhD thesis, University of California atBerkeley, 1994.
    [BCS99]Peter Braam, Michael J.Callahan, and Philip I. Schwan. The InterMezzo filesystem. In O'Reilly PerlConference 3.0. O'Reilly, 1999.
    [BEM91] A.Bhide,E.N.Elnozahy,and S.P.Morgan. A Highly Available Network File Server. Winter USENIXConference,pages 199-205 January 1991.
    [BHJ92]A. D. Birrell, A. Hisgen, C. Jerian, Timothy Mann, and Garret Swart. The Echo Distributed File System.In preparation. 1992.
    [BHM91]M. G. Baker, J. H. Hartman, M. D. Kupfer, K. W. Shirriff, and J. K. Ousterhout. Measurements of adistributed file system. In Proceedings of the 13th Annual Symposium on Operating Systems, October 1991.
    [BM92] M. Baker and M.Sullivan. The Recovery Box:Using Fast Recovery to Provide High Availability,Proceedings of the Summer 1992 USENIXConference, San Antonio, TX, June 8-12 1992, 31-44.
    [BO91] M. Baker and J. Ousterhout. Availability in the Sprite Distributed File System.Operating SystemsReview 25(2):95-98, April, 1991.
    [Bra01] P.J., Braam,InterMezzo File System: Synchronizing Folder Collections. White Paper, Stelias ComputingInc. http://www.inter-mezzo.org/docs/intermezzo-sync-white.pdf, April 2001.
    [B-RS]RS/6000 SP Books: GPFS, Available online under:http://www.austin.ibm.com/resource/aix_resource/sp_books/gpfs/
    [BW94] A.Borr,C. Wilhelmy, Highly-AvailableData Services for UNIX Client-Server Networks:Why Fault-Tolerant Hardware Isn't the Answer. Hardware and Software Architectures for FaultTolerance: Experiences andPerspectives, Lecture Notes in Computer Science, Vol. 774, pp. 285-304,Springer-Verlag, Berlin, 1994.
    [CF96]P. F. Corbett, and D. G. Feitelson, "The Vesta Parallel File System", in ACM Transactions on ComputerSystems, Vol. 14, No. 3, August 1996.
    [CT96] S. Chen, D. Towsleg. A Performance Evaluation of RAID Architectures. IEEE Transactions onComputers, Vol.45, No.10, October 1996. PP:1116-1130.
    [Dev96] M. Devarakonda, et al., Recovery in the Calypso File System, in ACM Transactions on ComputerSystems, Vol. 14, No. 3,August 1996.
    [DKM96]D. Dias, W. Kish, R. Mukherjee, and R. Tewari. A Scalable and Highly Available Web Server.InProceedings of the 1996 IEEE Computer Conference (COMPCON), February 1996.
    [DMS95] M. Devarakonda, A. Mohindra, J. Simoneaux and W. H. Tetzlatt. Evaluation of Design Alternatives fora Cluster File System. In Proceedings of 1995 USENIX Technical Conference, January 1995(New Orleans, LA).PP:35-46.
    [DR97] R. Dube, et al. Improving NFS Performance Over Wireless Links. IEEE Transactions on Computers,Vol.46, No.3, March 1997. PP:290-298.
    [Fow93] G. S. Fowler, Y. Huang, D. G. Korn and H. Rao. A user-level replicated file system. In Proc.of SummerUSENIX, pages 279-290, June, 1993.
    [Gra91] J. Gray and D. P. Siewiorek. High-availability computer systems. IEEE Computer, 24(9):39-48, 1991.
    [GSPG00] Group Services Programming Guide and Reference, RS/6000 Cluster Technology, Document NumberSA22-7355-01, Second Edition (April 2000), International Business Machines Corporation.
    [GUY90] R.G., Guy, J.S.,Heidemann, Mak, W., Page, T.W. Jr., Popek, G.J., Rothmeier, D., Implementation ofthe Ficus Replicated File System. In Proceedings of the USENIX Conference, Anaheim, California, June 1990.
    [HBM89] A. Hisgen, A. Birrell, T. Mann, M. Schroeder, and G. Swart. Availability and ConsistencyTradeoffs inthe Echo Distributed File System. In Proceedings of the Second Workshop onWorkstation Operating Systems,pages 49-53. Pacific Grove, CA, September, 1989.
    [HO92] J. Hartman and J. Ousterhout, Zebra: A striped network file system, USENIX Workshop onFile Systems,May 1992.
    [HO95] J.H. Hartman and J.K. Ousterhout. The Zebra Striped Network File System. ACM Transactions onComputer Systems, Vol.13, No.3, August 1995. PP:274-310.
    [Hua94] Y. Huang, P. Jalote and C. M. R. Kintala, Two techniques for transient software error Recovery. In M.Ban.atre and P. A. Lee (Eds.), Hardware and Software Architectures for Fault Tolerance: Experience andPerspectives, Lecture Notes in Computer Science, No. 774, Springer Verlag, pages 159-170, 1994.
    [I-SAN]Introduction to Storage Area Network, SAN. IBM Redbook, Available online under http://publib-b.boulder.ibm.com/Redbooks.nsf/RedbookAbstracts/sg245470.html
    [JSB93] C. Jerian, G. Swart, A.D. Birrell, A. Hisgen, and T. Mann, “Availability in the Echo File System”, SRCResearch Report 112, Systems Research Center, Digital Co., Palo Alto, CA,1993.
    [Lis91] Barbara Liskov, et al. Replication in the Harp File System. Proceedings of the Thirteenth Symposium onOperating Systems Principles, October, 1991.
    [LLS92]R. Ladin, B. Liskov, L. Shirira, and S. Ghemawat. Providing Availability Using Lazy Replication. ACMTransactions on Computer Systems, 10(4):360-391, 1992.
    [LMG95]D. Long, A. Muir and R. Golding, A longitudinal survey of internet host reliability. In procedings of theIEEE symposium on reliable distributed system, Sep., 1995
    [LS90] E. LEVY and A. SILBERSCHATZ. Distributed File Systems:Concepts and Examples. ACM ComputingSurveys, Vol.22, No.4, December 1990. PP:321-374.
    [LT96] E.K. Lee and C.A. Thekkath. Petal:Distributed Virtual Disks. In Proceedings of the 1996 InternationalConference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October1996(MA, USA), ACM PRESS. PP:84-92.
    [JM92] F. jahanian, Jr. W. L. Moran. Strong, Weak, and Hybrid Group Membership. In Proceedings of 2nd IEEEworkshop on management of Replicated Rata November 1992.
    [JRF93] F. jahanian, R. Rajkumar, S. Fakhouri. Processor Group Membership protocols: Specification, design andimplementation. In Symposium on Reliable Distributed System. October 1993
    [KBM] Y.A. Khalidi, J.M. Bernabeu, V. Matena, K. Shirriff, M. Thadani, 'Solaris MC: AMulti Computer OS',available online under http://www.sunlabs.com/research/solaris-mc.
    [KIP97] Mahesh Kalyanakrishman, R. K. Iyer and J. U. Patel, Reliability of internet hosts: a case study from theend user’s perspective. In: Proceedings of the sixth international conference on computer communications andnetworks, IEEE, Sep., 1997
    [Kis93]J. J. Kistler, Disconnected Operation in a DistributedFile System, PhD thesis, Carnegie Mellon University,May. 1993.
    [KM92] J.J. Kistler and M.Satyanarayanan, Disconnected Operation in the Coda File System. ACM Transactionson Computer Systems 10(1), February, 1992.
    [KS93] P. Kumar, M. Satyanarayanan, Log-Based Directory Resolution in the Coda File System. In Proceedingsof the Second International Conference on Parallel and Distributed Information Systems (San Diego, CA, January1993).
    [Mat97] J.N. Matthews, et al. Improving the Performance of Log-structured File Systems with Adaptive Methods.In Proceedings of the 16th ACM Symposium on Operating Systems Principles, October 1997 (Sant Malo, France).PP:238-251.
    [MD94] A. Mohindra, M. Devarakonda, Distributed Token Management in Calypso File System, Proceeding,Sixth IEEE Symposium on Parallel and Distributed Processing, Oct. 1994, pp. 290-297.
    [MJ98]Q.M. Malluhi and W.E. Johnston, Coding for High Availability of a Distributed-Parallel Storage System",IEEE Transactions on Parallel and Distributed Systems, Vol. 9, No.12, pp. 1237-1252, December 1998.
    [MLB96] L.B.Mummert, Exploiting Weak Connectivity in a Distributed File System. Ph.D dissertation.Computer Science Division, School of Computer Science, Carnegie Mellon University, December 1996.
    [Nel88] M.N.Nelson, et al, Caching in the Sprite Network File System, ACM Transactions on Computer Systems.Vol.6, No.1, February 1988. 134-154
    [NK96]N. Nienwejaar, and D. Kotz, "The Galley Parallel File System", in Proceedings of the InternationalConference on Supercomputing, July 1996.
    [OCD88] J. Ousterhout, A. Cherenson, F. Douglis, M. Nelson, and B. Welch, The Sprite NetworkOperatingSystem, IEEE Computer, pp:23-35, February 1988.
    [PB94] B. Pawlowski, et al. NFS verson 3 Design and Implementation. in Proceedings of the 1994 SummerUSENIX Conference, USENIX Association, Boston, CA, June 1994. PP:137-151.
    [PGP90] G. J, POPEK, R. G., GUY, T. W., PAGE,. J. S. , Heidemann,. Replication in Ficus Distributed FileSystems. In Proceedings of the Workshop on Management of Replicated Data (Houston, TX, November 1990).
    [Pie89] P. Pierce, A Concurrent File System for a Highly Parallel Mass Storage Subsystem, Proc. 4th Conf. onHypercube Concurrent Computers and Applications, Monterey, Mar. 1989, 155-160.
    [Ros90] D. S. H. Rosenthal. Evolving the Vnode Interface. In Proceedings of USENIX Summer Conference, June1990(Anaheim California). PP:107-117.
    [RO90] M. Rosenblum, J. K. Ousterhout. the LFS Storage Manager. In Proceedings of USENIX SummerConference, June 1990(Anaheim California). PP:315-324.
    [Sat90] M. Satyanarayanan. “Scalable, Secure, and Highly Available Distributed File Access.” IEEE Computer,volume 23, number 5, May, 1990.
    [SEK98] S. Soltis, G. Erickson, K. Preslan, M. O’Keefe, and Tom Ruwart. The Design and Performance of AShared File System for IRIX. In the Sixth Goddard Conference on Mass Storage Systems and Technologies inCocoperation with the Fifteenth IEEE Symposium on Mass Storage Systems, College Park, Maryland, March1998. 41-56
    [SK90] M. Satyanarayanan, J.J. Kistler, et al, Coda: A Highly Available File System for a DistributedWorkstation Environment. IEEE TRANSACTIONS ON COMPUTERS, VOL. 39, NO. 4, APRIL 1990 PP:447-459.
    [SMZ01] 史小冬,孟丹,祝明发.COSMOS:一种可扩展单一映象机群文件系统。DPCS 2001全国开放式分布与并行计算学术会议,2001年10月。
    [Sie89] Alex Siegel, Kenneth Birman, and Keith Marzullo. “Deceit: A Flexible Distributed File System.”Technical report 89-1042. Department of Computer Science, Cornell University, 1989.
    [Sin97] P. K. Sinha. Distributed Operating System: Concepts and Design. IEEE PRESS, 1997.
    [SR87] R. Sandberg. The Sun Network File System: Design, Implementation and Expeience. In Proceedings ofUSENIX Summer Conference, Summer 1987. University of California Press. PP:300-313.
    [SRK96] S. Soltis, T. Ruwart, and M.O.Keefe, The Global File System, Fifth NASA Goddard Conference onMass Storage Systems and Technologies, College Park, MD,September 1996.
    [Sol97]Steven R. Soltis. The Design and Implementation of a Distributed File System Based on Shared NetworkStorage. PhD thesis, University of Minnesota, Department of Electrical and Computer Engineering, Minneapolis,Minnesota, August 1997.
    [SS97] M. Satyanarayanan and M. Spasojevic. AFS and the Web:Competitors or Collaborators?. ACM OperatingSystem Overview, Vol.31, No.1, January 1997. PP:18-23.
    [Sta97] T. Stabell-Kul. Security and Log Structured File Systems. ACM Operating Systems Review, Vol.31, No.2,April 1997. PP:9-10.
    [Ste90] D.C. Steere, et al. Efficient User-Level File Cache Management on the Sun Vnode Interface. InProceedings of USENIX Summer Conference, June 1990(Anaheim, California). PP:325-331.
    [The97] C.A. Thekkath. Frangipani: A Scalable Distributed File System. In proceedings of the ACM 16th SOSP,October 1997(Saint-Malo, France). PP:224-237.
    [TN97] P. Triantafillou and C. Neilson. Achieving Strong Consistency in a Distributed File System. IEEETransactions on Software Engineering, Vol.23, No.1, January 1997. PP:35-55.
    [TTP95]Douglas B. Terry, Marvin M. Theimer, Karin Petersen, Alan J. Demers, Mike J. Spreitzer, and Carl H.Hauser. Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System. In Proceedings ofthe Fifteenth ACM Symposium on Operating Systems Principles, pages 172-183, December 1995.
    [WGS96] J. Wilkes, R. Golding, C. Staelin and T. Sulliran. The HP AutoRAID Hierarchical Storage System.ACM Transactions on Computer Systems, VOL.14, No.1, February 1996. PP:108-136.
    [W-NAS] Network-Attached Storage White paper, Sun Microsystems. Avaliable online underhttp://www.sun.com/storage/white-papers/nas.html
    [WPK83] Walker, Popek, Kline and Thie. “The LOCUS Distributed Operating System.” Proceedings of the NinthSymposium on Operating Systems Principles, pages 49-70, October, 1983.
    [Zha00] 张文生,多节点机群系统的高可用管理软件的设计与实现,中国科学院计算技术研究所硕士学位论文,2000年7月

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700