磁盘阵列在线重建与RAID5扩容的设计与实现

英文题名：Design and Implementation of the Recovery and Online Capacity Expansion for RAID
作者：舒星
论文级别：硕士
学科专业名称：计算机系统结构
中文关键词：磁盘阵列 ; 可靠性 ; 重建算法 ; 在线扩容
英文关键词：Redundant Arrays of Inexpensive Disks ; Reliability ; Reconstruction Algorithm ; Online Capacity Expansion
学位年度：2007
导师：周可
学科代码：081201
学位授予单位：华中科技大学
论文提交日期：2007-06-01

摘要

随着信息化的推进,需要存储的信息量呈爆炸式增长,存储系统的规模日趋庞大,可靠性和可扩展性便成为衡量存储系统总体性能的两大重要因素。而磁盘阵列作为构建大规模存储系统的基本单元,如何有效地提高其容错和恢复能力,如何在线快速扩充容量成为新的研究热点。
     在分析已有重建算法和工作负载特性的基础上,设计并实现了基于热点优先的重建优化算法(Popularity-Based Reconstruction Optimization algorithm,PRO)。该算法的核心思想是优先重建被用户频繁访问的数据区域,尽可能地减少磁头移动带来的开销以缩短用户响应时间和重建时间。除此以外,在阵列重建完成前,对落到替换盘已重建数据块的读请求进行重定向;减少对已重建过的数据块反复的重构写,进一步有效地提高了系统的可靠性。测试结果表明:采用了PRO的重建算法比未采用PRO的重建算法,重建过程中用户响应时间和重建时间都有提高。实验发现,在Linux操作系统上实现的PRO更适用于优化读写请求混合且小请求占主导的联机事务处理系统(OTLP)应用。
     在原有的磁盘阵列控制软件基础上,按照命令执行的顺序指导设计并实现了RAID5级别的在线扩容功能。使得用户可以在线添加新磁盘、或用大容量磁盘拷贝并替换原磁盘的方法来动态地扩展阵列空间,而无需关闭、重启系统,无需备份额外的数据且保障存储服务在扩容过程中不中断。实验结果表明在线扩容技术提高了磁盘阵列存储系统的可扩展性。
With the widely use of information technology, the demand on capacity is driven by explosively increasing information, and the demand on reliability of the storage system is becoming even more as there exists various risks. So how to increase the capacity of RAID with least cost, ensure the system reliability without sevice performance decline, and overcome the speed bottleneck are becoming more and more important in storage world.
     In order to improve the reliability of the system, we focus on the recovery mechanism. Popularity-Based Reconstruction Optimization algorithm is implemented based on the research of the characteristic of existing reconstruction algorithms and the workloads. Such algorithm reconstructs the regions which users frequently visit as soon as possible to reduce the disk head movement for reducing the user response time and the reconstruction time. Implement read-redirection as the original one doesn’t need to reconstruction-read when the requests fall on the replace disk only until the reconstruction completes, avoid double synchronous write. The result demonstrates that PRO implemented on Linux OS greatly outperforms the original one in terms of the OLTP applications workload which is read/write-mixed and small I/O requests-dominant.
     Increasing data require much more online storage. Design Online Capacity Expansion according the order and implement it in our RAID abstract layer. It makes users increase the capacity of the RAID system by adding new disks or using bigger disks to copy and replace the original disks, while they do not need to close and restart their system, backup their data or stop their using. Online Capacity Expansion Technology improves the scalability of RAID greatly.

引文

[1]王芳.网络磁盘阵列系统的研究: [博士学位论文].武汉:华中科技大学,2001.
    [2]吉永光.树型结构存储系统设计与层间缓存技术:[硕士学位论文].武汉:华中科技大学,2007.
    [3] IDC Presents Industry's First Worldwide Fore-cast on the Data Protection and Recovery Management Market: Expect Revenue to More than Triple by 2011.http://www.idc.com/. March 2007
    [4] David Patterson. A New Focus for a New Century: Availability and Maintainability >> Performance. Keynote of the 1st USENIX Conference on File and Storage Technologies (FAST '02), Monterey, CA, January 2002,1~62
    [5] Patterson D A, Gibson G A, Katz R. A Case for Redundant Arrays of Inexpensive Disks (RAID). In: ACM SIGMOD. Proceedings of the 1988 ACM SIGMOD international conference on Management of data. New York, NY, USA: ACM Press, 1988. 109~116
    [6]张江陵,冯丹.海量信息存储(第二版).北京:科学出版社, 2003. 24~51, 85~90
    [7] John Wilkes, Richard Golding. The HP AutoRAID Hierarchical Storage System. ACM Trans. On Computer Systems, Vol.14, No.1, Feb. 96, pp.108-136
    [8] D. Colarelli and D. Grunwald. Massive Arrays of Idle Disks For Storage Archives. In Proceedings of the 15th High Performance Networking and Computing Conference, November 2002, 1~11
    [9] Introducing RAIDn Breakthrough technology that Revolutionizes storage Reliability. InoStor Corporation. www.inostor.com
    [10] Zhou ke, Huang yongfeng, Feng dan. Disk Tree—A Case of Parallel Storage Architecture to improve performance in random access pattern. Chinese journal of electronics, 2005.1, 1~9
    [11]吉永光,周可,冯丹.简单树结构存储系统数据分块标准及其证明.华中科技大学学报, 2007 Vol.35 No.3 P.53-55
    [12]周可.外存储系统数据组织与体系结构:[博士学位论文].武汉:华中科技大学,2003.
    [13]周敬利,陈宏霞,杨立辉.一种基于磁盘调度的在线数据重构算法.计算机工程与科学, 2003, 20(3): 68~69
    [14]杨立辉.网络存储中高可靠性关键技术的研究: [博士学位论文].武汉:华中科学技术大学图书馆,2003.
    [15] Jack Y.B. Lee and John C.S. Lui. Automatic Recovery from Disk Failure in Continuous-Media Servers. IEEE Transaction On Parallel And Distributed Systems, Vol. 13, No. 5, May 2002, pp.499-515
    [16]王沛,韩耀伟,刘斌等. Linux中SoftwareRAID驱动程序的机制分析.小型微型计算机系统,2001,22(3):305～308
    [17] I. S. Reed and G. Solomon. Polynomial codes over certain finite fields. Journal of the Society for Industrial and Applied Mathematics, 1960, 8:300-304
    [18] M. Blaum, J. Brady, J. Bruck, et al. EVENODD: an efficient scheme for tolerating double disk failures in RAID architectures. IEEE Transactions on Computers, 1995 44:192-202
    [19] L. Xu and J. Bruck. X-code: MDS array codes with optimal encoding. IEEE Transactions on Information Theory, 1999, pages 272-276
    [20] Jeff R. Hartline, Tapas Kanungo and James Lee Hafner. R5X0: An Efficient High Distance Parity-Based Code with Optimal Update Complexity. IBM Research Report RJ 10322, August 2004,1~14
    [21] James Lee Hafner. WEAVER Codes: Highly Fault Tolerant Erasure Codes for Storage Systems. In Proceedings of the 4th USENIX Conference on File and Storage Technologies (FAST '05), San Francisco, CA, Feb 2005, pp. 211–224
    [22] M. Holland and G.A. Gibson. Parity Declustering for Continuous Operation in Redundant Disk Arrays. In Proceedings of the 5th Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V), 1992, 23-25
    [23] R. Hou, J. Menon, and Y. Patt. Balancing I/O Response Time and Disk Rebuild Time in a RAID5 Disk Array. In Proceedings of the Hawaii International Conference on Systems Sciences, pages 70-79, 1993
    [24] P. Corbett, B. English, A. Goel, et al. Row-Diagonal Parity for Double Disk Failure Correction. In Proceedings of 3rd USENIX Conference on File and Storage Technologies (FAST '04), April 2004, pages 1-14
    [25] H. H. Kari, H. K. Saikkonen, N. Park ,et al. Analysis of repair algorithms for mirrored-disk systems. IEEE Transactions on Reliability, Vol 46, No. 2, 1997, pages 193-200
    [26] R. Muntz and J. Lui, Performance Analysis of Disk Arrays under Failure. In Proceedings of the 16th Conference on Very Large Data Bases, 1990, pp 162-173
    [27] M. Sivathanu, V. Prabhakaran, F. Popovici, et al. Improving Storage System Availability with D-GRAID. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST '04), San Francisco, CA, March 2003,15~30
    [28] I/O workload characteristics. http://www.pdl.cmu.edu/Workload/index.html
    [29] M. E. Gomez and V. Santonja. Characterizing Temporal Locality in I/O Workload. In Proceedings of the 2002 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS'02). San Diego, USA, July 2002, pp. 92~103
    [30] L. Cherkasova, G Ciardo. Characterizing Temporal Locality and its Impact on Web Server Performance. Technical Report HPL-2000-82, Hewlett Packard Laboratories, July 2000,1~13
    [31] M. Arlitt and C. Williamson. Web server workload characterization: the search for invariants. In Proceedings of the ACM SIGMETRICS '96 Conference, Philadelphia, PA, May 1996, 126～l37
    [32] Lei Tian, Dan Feng, Hong Jiang, et al. PRO: A Popularity-based Multi-threaded Recon-struction Optimization for RAID-Structured Storage Systems. In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST '07), San Jose, CA, Feb 2007,1~15
    [33]冯丹,彭丽.一种新颖的RAID系统在线扩容方案.计算机应用研究,2006, 23(12):65~69
    [34]彭丽.光纤通道磁盘阵列设备驱动与在线扩容技术研究:[硕士学位论文].武汉,华中科技大学,2006.
    [35] Jose Luis Gonzalez,Toni Cortes.Increasing the capacity of RAID5 by online gradual assimilation. International Workshop on Storage Network Architecture and Parallel I/Os, Antibes Juan-les-pins, France , September 2004, pages 17-24
    [36]王光辉.RAID系统的多控制器技术研究:[硕士学位论文].上海:上海交通大学,2001.
    [37]王元放.RAID系统的可靠性研究及设计实践:[硕士学位论文].上海:上海交通大学,2002.
    [38]赵跃龙.集成式磁盘阵列及其相关理论问题研究:[博士学位论文].武汉:华中科技大学图书馆,1996.
    [39] Linux设备驱动程序.魏永明,耿岳,钟书毅译.北京:中国电力出版社, 2006. 21～44,474～481
    [40] Linux内核设计与实现.陈莉君,康华,张波译.北京:机械工业出版社, 2005. 173～184
    [41]李善平,陈文智.边干边学——Linux内核指导.杭州:浙江大学出版社,2002.95～97
    [42]倪勋.光纤通道磁盘阵列及其自适应延时策略的研究与实现:[硕士学位论文].武汉:华中科技大学,2006.
    [43]饶国林.Linux下磁盘阵列的实现及其高可用性的研究:[硕士学位论文].武汉:华中科技大学,2006.
    [44] M. Holland. On-Line Data Reconstruction in Redundant Disk Arrays. Carnegie Mellon Ph.D. Dissertation CMU-CS-94-164, April 1994
    [45] Chi Zhang,Xiang Yu,Arvind Krishnamurthy.”Configuring and Scheduling an Eager-writing Disk array for a Transaction Processing Workload”in Proceedings of the FAST 2002 Conference on File and Storage Technologies, 289-304
    [46] W.V. Courtright II, G.A. Gibson, M. Holland and J. Zelenka. RAIDframe: RapidPrototyping for Disk Arrays. In Proceedings of the 1996 ACM SIGMETRICS international Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '96), May 1996, Vol. 24 No. 1, pages 268-269
    [47] SPC Web Search Engine I/O Trace. http://traces.cs.umass.edu/storage/
    [48] OLTP Application I/O and Search Engine I/O. UMass Trace Repository. http://traces.cs.umass.edu/index.php/Storage/Storage/index.php/Storage/Storage

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700