分布式并行数据库系统DP-SQL的恢复机制

作者：王宇
论文级别：硕士
学科专业名称：计算机系统结构
中文关键词：Agent ; 日志恢复 ; 备份恢复 ; 动态恢复 ; 分布式并行数据库
英文关键词：Agent ; Log Recovery ; Backup Recovery ; Dynamic Recovery ; Distributed and Parallel Database Systems
学位年度：2003
导师：刘心松
学科代码：081201
学位授予单位：电子科技大学
论文提交日期：2003-03-01

摘要

分布式并行数据库系统(Distributed and Parallel Database System，简称DPDBS)是分布式计算、并行处理以及网络技术相结合的产物，其不但对分散数据具有较强的管理能力，同时具有良好的性能。随着数据库应用的扩展，分布式并行数据库系统已日益得到了人们的重视，并成为计算机技术最活跃的研究领域之一。
     DP-SQL是由电子科技大学8010研究室自主研发的分布式并行数据库系统。它以当前最为流行的开放源码数据库Mysql为基础，不但保持了Mysql的高处理速度，而且具有分布式并行系统的高可靠性，高吞吐量、高存储容量等一系列优点。根据功能可以把该系统划分为四个组成部分：用户接口子系统、通信子系统、服务器管理子系统和执行子系统。其中，接口子系统位于客户机上，负责接收用户的输入，同时将执行的结果返回给用户；通信子系统为其他各部分提供高效可靠的消息通讯机制；服务器管理子系统负责整个系统的正常运行以及为执行子系统提供各种系统信息；分布式并行执行子系统负责具体的数据库操作。
     恢复机制是保证分布式并行数据库系统能正常提供数据库服务的基础。当数据库出现故障(硬件和软件)时，恢复机制负责在故障排除后对其进行恢复，使其回到正常状态，继续提供数据库服务。另外，数据库节点在重启过程也需要进行本节点数据库的恢复，以达到和系统中其他节点上的数据库全局一致的状态。
     本文在深入研究的基础上，讨论了分布式并行数据库DP-SQL的设计思想，重点研究了其恢复机制，特别是日志恢复的设计与实现。在分析了传统日志恢复机制的不足之后，本文提出了一种新的基于代理(Agent)的动态恢复协议。该协议使用了代理(Agent)来缓存在恢复期间新发起的数据库操作，并在通过日志完成部分恢复后，依靠这些缓存的操作进行进一步恢复。和传统的基于日志的恢复算法相比，该动态恢复协议在保证系统一致性的同时，能够减小恢复所带来的额外系统开销以及对个别节点的影响和过分依赖，从而使系统的整体性能和可靠性得到提高。
     本文第一章回顾了分布式并行数据库的发展状况；第二章介绍了分布式并行数据库系统的特点及其传统恢复机制；第三章讨论了DP-SQL的系统结构；第四章对DP-SQL的恢复机制进行了深入探讨；第五章将详细分析DP-SQL中基于代理(Agent)的动态恢复协议，并通过性能分析证明了其优越性；第六章总结全文，并对以后的研究作出了展望。
Distributed and Parallel Database System (DPDBS) is the joint of distributed computing, Parallel Process, and Network technology. It is not only powerful at distributed data management, but also has well performance in parallel processing. With the extension of database application, the DPDBS has obtained more and more recognition. It has become one of the most active and promising research areas of computer science.
    DP-SQL is a distributed and parallel database system developed by 8010 Research Lab. Based on the most popular open-source database system Mysql, it not only retains the high performance of Mysql itself, but also possesses most virtues of Distributed and Parallel Systems, such as high reliability, high availability, high throughput, large storage capability, etc. The whole system can be divided into four sub-systems, which are user interface sub-system, communication sub-system, server managing sub-system, and distributed and parallel executing sub-system. User interface sub-system locates on clients. It sends queries from the client to a proper server and retrieves results. The communication sub-system provides quality and reliable message passing mechanism for other modules. The server managing sub-system consists of multiple services that make the whole system run well. Finally, the distributed and parallel processing sub-system controls all the details related to execution of all kinds of commands.
    Recovery from node failures is a critical issue in distributed and parallel database systems. When some failures happen, the database can recover to a consistent state and continue its service with the help of recovery system. Moreover, a database node also requires a recovery process during its startup session, by which it can get consistent with other running nodes in the system.
    Among the various recovery techniques, log-based recoveries grow popular for their reliability and tolerable overhead. However, in conventional log-based recovery protocols, the nodes providing recovery service may still be overburdened, especially when the recovery is resource consuming. As a result, not only the system performance is compromised, but also the possibility of large-scale failure increases. In this paper, we present an agent-based dynamic recovery protocol. It divides the whole recovery process into three major steps: log-recovery, agent-recovery, and




    synchronization. The key idea of this new protocol is to cache new database operations during recovery in agents. All these cached operations can then be replayed independently later. The analysis indicates that the new protocol can improve recovery speed by reducing disk I/O and minimize internode's dependency during recovery. Therefore, system failure rate is cut down and the overall performance gets improved.
    The balance of this paper is organized as follows. In Chapter 1 we review the progress of research on DPDBS. Then, In Chapter 2 we discuss the features of DPDBS and the conventional recovery mechanisms. Chapter 3 presents the architecture of a quality DPDBS named DP-SQL. Some implementation details are also discussed. Chapter 4 analyzes the recovery system in DP-SQL. In Chapter 5, the Agent-based dynamic recovery protocol is presented. Its proof of correctness, implementation details, and performance analysis are also discussed. The last chapter draws the conclusion and makes expectation for the future research.

引文

1． C．J．Date．数据库系统导论，孟小峰等译．北京：机械工业出版社，2000
    2． Abraham Silberschatz 等．数据库系统概念，杨冬青等译．北京：机械工业出版社，2000
    3．萨师煊，王珊．数据库系统概论．北京：科学出版社，1992
    4．施伯乐等．数据库系统导论．北京：高等教育出版社，1994
    5． Hector Garicia-Molina等．数据库系统实现，杨冬青等译．北京：机械工业出版社，2001
    6．沈娟，赵雄芳．对分布式数据库发展方向的分析。计算机工程与科学，1994年01期
    7．昌月楼，杨利．分布式数据库技术的现状和发展方向．计算机工程与科学，1995年03期
    8．阳国贵，金辉，王怀民．分布式数据库技术回顾．计算机工程与科学，1995年03期
    9．毛法尧．分布式数据库系统．小型微型计算机系统，1995年08期
    10．李霖，周兴铭．分布式数据库研究新趋势．计算机工程与科学，1997年03期
    11. Ramesh Gupta, Jayant Haritsa, Krithi Ramamritham. Revisiting Commit Processing in Distributed Database System. Proc. Of ACMSIGMOD Intl. Conf. on Management of Data, Tuscon, Arizona, USA, May 1997, 486～497
    12. Lorenzo Alvisi, Bruce Hoppe, Keith Marzullo. Nonblocking and orphan-free message logging protocols. FTCS 1993, 145-154
    13. Sreekaanth S. Isloor, T. Anthony Marsland. System Recovery in Distributed Database. IEEE Trans, 1979
    14．邵佩英．分布式数据库系统及其应用．北京：科学出版社，2000
    15．李建中，孙文隽．并行关系数据库管理系统引论．科学出版社，1998
    16．王于同．并行数据库性能研究．计算机工程与应用，1997年01期
    17. M. L. Liu, D. Agrawal, A. El Abbadi. The Performance of Two-Phase Commit Protocols in the Presence of Site Failures. Distributed and Parallel Databases, 1998, 6(2), 157～182
    18. Lorenzo Alvisi and Keith Marzullo. Message Logging: Pessimistic, Optimistic, Causal, and Optimal. IEEE transactions on software engineering, February 1998, 24(2), 149～159
    19. E.N. Elnozahy, D. B. Johnson, Y. M. Wang. A Survey of Rollback-Recovery Protocols in Message-Passing Systems. Technical Report CMU-CS-96-144, Department of Computer Science, Carnegie Mellon University, August 1996
    20. Tony T-Y. Juang, S. Venkatesan. Crash Recovery with Little Overhead. Proc. Of the 11th Int'1 Conf. On Distributed Computing Systems(ICDCS-11), May 1991, 454～461


    21. Sriram Rao, Lorenzo Alvisi,Harrick M. Vin. The Cost of Recovery in Message Logging Protocols. IEEE Transaction on Knowledge and Data Engineering, 2000, 12 (2), 160～173
    22. Om P. Damani, Ashis Tarafdar, Vijay K. Garg. Optimistic Recovery in Multi-Threaded Distributed Systems. 18th IEEE Symposium on Reliable Distributed Systems, 1999, 234.
    23. Peter Triantafillou. Independent Recovery in Large-Scale Distributed Systems. IEEE Transactions on Software Engineering, November 1996, 22(11), 812～826
    24. B. Wirz, E. Nett. A Generic Log-Service Supporting Fast Recovery in Distributed Fault-Tolerant Systems. IEEE Workshop on Advances in Parallel and Distributed Systems(APADS), Princeton, NJ, Oct. 1993, 121～126
    25. Son S. H. Efficient decentralized checkpointing in distributed database systems. Proceedings of the Twenty-First Annual Hawaii International Conference on Software Track, 1988, 554～560
    26. Baldoni Roberto, Quaglia Francesco, Fornara Paolo. An Index-based checkpointing algorithm for autonomous distributed systems. IEEE Transactions on Parallel and Distributed Systems, 1999, 10(2), 181-192
    27. Roberto Baldoni, Giacomo Cioffi Dis, Jean-Michel Helary, et al. Direct Dependency-Based Determination of Consistent Global Checkpoints. http://citeseer. nj. nec. com/17316, html
    28. Islene C. Garcia, Luiz E. Buzato. Checkpointing using Local Knowledge about Recovery Lines. http://citeseer, nj. nec. com/garcia99checkpointing. html
    29. Jun-Lin lin, Margaret H. Dunham. A Low-Cost Checkpointing Technique for Distributed Databases. Distributed and Parallel Databases, 10(3), Dec. 2001, 241-268
    30. E. N. Elnozahy, D. B. Johnson, and W. Zwaenepoel. The Performance of Consistent Checkpointing. Proceedings of the 11th Symposium on Reliable Distributed Systems, Houston, Texas, October 1992, 39-47
    31. Pie-Jyun Leu, Bharat Bhargava. Concurrent Robust Checkpointing and Recovery in Distributed Systems. Proceedings of the 4th International Conference on Data Engineering, Feb. 1988, 154-163
    32．阳富民，冯玉才，吴永英等．一种分布式数据库管理系统体系结构．计算机工程与应用，1995年02期
    33. C. J. Date, Andrew Warden. Relational Database Writings (1985-1989). MA: Addison-Wesley, 1990
    34. C. J. Date, Hugh Darwen. Relational Database: Writings (1989-1991). MA: Addison-Wesley, 1992
    35．周龙骧．分布式数据库系统实现技术．北京：科学出版社，1998


    36. Rob Goldring. A Discussion of Relational Database Replication Technology. InfoDB 8, No.1(Spring 1994)
    37. Ramesh Gupta, Jayant Haritsa, Krithi Ramamritham. Revisiting Commit Processing in Distributed Database System. Proc. Of ACM SIGMOD Intl. Conf. on Management of Data, Tuscon, Arizona, USA, May 1997, 486-497
    38. P. K. Chrysanthis, G. Samaras, Y. J. Al-Houmaily. Recovery and Performance of Atomic Commit Processing in Distributed Database Systems. Recovery Mechanisms in Database Systems, Y. Kumar and M. Hsu, Eds. Prentice-Hall, 1998, Chapt. 13
    39. M. L. Liu, D. Agrawal, A. El Abbadi. The Performance of Two-Phase Commit Protocols in the Presence of Site Failures. Distributed and Parallel Databases, 1998, 6 (2), 157-182
    40．Rajkumar Buyya．高性能集群计算：结构与系统(第一卷)，郑纬民，石威，汪东升等译．北京：电子工业出版社，2001
    41. Hongzhang Shan, Jaswinder Pal Singh, Leonid Oliker, Rupak Biswas. Message passing vs. shared address space on a cluster of SMPs. IPDPS 2001, 63
    42. John Stone, Fikret Ercal. Workstation Clusters for Parallel Computing. IEEE Potentials, May 2001, 31-33.
    43. Miller E.,Long D.,Freeman W.,et al. Strong security for distributed file systems. In Proceedings of the 20th IEEE International Performance, Computing and Communications Conference (IPCCC '01), Phoenix, April 2001, 34～40
    44. Mikyong Han, Yongik Yoon. An implementation and performance analysis of backup system using concurrent log processing in real-time DBMS. Proceedings of the 4th International Workshop on Real-Time Computing Systems and Applications (RTCSA '97), 1997, 118～125
    45．罗朝劲．D-SQL分布式数据库系统的启动与恢复：[硕士学位论文]．成都：电子科技大学计算机学院，2002
    46. Date. C. J, H. Darwen. A Guid to the SQL Standard, Fourth Edition, Reading MA: Addison-Wesley, 1997
    47. R. D. Schlichting, F. B. Schneider. Fail-Stop Processors: An Approach to Designing Fault-Tolerant Computing Systems. ACM Transactions on Computer Systems, Aug. 1983, 1(3), 222-238
    48. David Lomet. Consistent Timestamping for Transactions in Distributed Systems, tech. Report CRL90/3. Cambridge Research Laboratory, Digital Equipment Corp. Cambridge, MA, Sept. 1990
    49. Lamport, L. Time, Clocks, and the Ordering of Events in a Distributed System. Communications of the ACM, 1978, 21(7), 558-565.