纠删码存储系统中基于网络计算的高效故障重建方法

英文篇名：An Efficient Failure Reconstruction Based on In-Network Computing for Erasure -Coded Storage Systems
作者：唐英杰 ; 王芳 ; 谢燕文
英文作者：Tang Yingjie;Wang Fang;Xie Yanwen;Wuhan National Laboratory for Optoelectronics (Huazhong University of Science and Technology);Key Laboratory of Information Storage System (Huazhong University of Science and Technology), Ministry of Education;Shenzhen Huazhong University of Science and Technology Research Institute;
关键词：分布式存储系统 ; 纠删码 ; 软件定义网络 ; 恢复开销 ; 网络计算
英文关键词：distributed storage system;;erasure code;;software defined networking(SDN);;recovery overhead;;in-network computing
中文刊名：JFYZ
英文刊名：Journal of Computer Research and Development
机构：武汉光电国家研究中心(华中科技大学);信息存储系统教育部重点实验室(华中科技大学);深圳华中科技大学研究院;
出版日期：2019-04-15
出版单位：计算机研究与发展
年：2019
期：v.56
基金：国家自然科学基金项目(61772216);; 武汉应用基础研究计划项目(2017010201010103);; 深圳市科技计划项目(JCYJ20170307172248636);; 中央高校基本科研业务费专项资金;; 国防预研项目(31511010202)~~
语种：中文;
页：JFYZ201904009
页数：12
CN：04
ISSN：11-1777/TP
分类号：93-104

摘要

目前分布式存储系统的规模越来越大,不论存储设备是磁盘还是固态盘,系统都始终面临着数据丢失的风险.传统分布式存储系统大多采用基于三副本的高可靠性技术,但为了追求较低的存储开销,大量系统正在转向基于纠删码的可靠性方法.但是在纠删码方案下,重建故障数据需要读取多个存储设备,这将导致大量的网络传输和存储I/O操作,增大系统恢复开销.为了能够在不损失其他性能的同时降低恢复开销,利用软件定义网络(software defined networking, SDN)技术,提出一种基于网络计算的高效故障重建方案——网络流水线(in-network pipeline, INP),其中SDN控制器利用网络的全局拓扑信息构造重建树,系统依据重建树进行数据传输,并在交换机上完成部分计算,减少向后传输的网络流量,从而消除网络瓶颈,提升恢复性能.测试评估了不同网络带宽下INP的恢复效率.实验结果表明:与传统的纠删码系统相比,INP总是能大幅减少网络流量,并且在一定带宽条件下,能够接近正常读的时间开销.
Nowadays, the scale of distributed storage systems is getting increasingly larger. No matter whether the storage devices are disks or solid-state drives, the system is always faced with the risk of data loss. Traditional storage systems maintain three copies of each data block to ensure high reliability. Today, a number of distributed storage systems are increasingly shifting to the use of erasure codes because they can offer higher reliability and lower storage overhead. The erasure codes, however, have an obvious shortcoming in the reconstruction of an unavailable block, because they need to read multiple disks, which results in a large amount of network traffic and disk operations and ultimately high recovery overhead. In this paper, INP(in-network pipeline), an effective failure reconstruction scheme based on in-network computing that utilizes SDN(software defined networking) technology is presented in order to reduce the overhead of recovery without sacrificing any other performance. We use the global topology information for network from SDN controller to establish the tree of reconstruction, and transmit data according to it. The switches do part of the calculation that can reduce the network traffic, therefore to eliminate the bottleneck of the network, and to enhance the recovery performance. We evaluate the recovery efficiency of INP in different network bandwidths. Compared with the common erasure code system, it greatly reduces the network traffic and in a certain bandwidth, the degraded read time is the same as that of normal reading.

引文

[1]Apache.The Hadoop distributed file system[EB/OL].[2017-05-16].http://wiki.apache.org/hadoop/HDFS
    [2]Huang Cheng,Simitci H,Xu Yikang,et al.Erasure coding in windows azure storage[C]//Proc of the 21st USENIXConf on Annual Technical Conf.Berkeley,CA:USENIXAssociation,2012:15-26
    [3]Schmuck F,Haskin R.GPFS:A shared-disk file system for large computing clusters[C]//Proc of the 1st USENIX Conf on File and Storage Technologies.Berkeley,CA:USENIXAssociation,2002:231-244
    [4]Schroeder B,Lagisetty R,Merchant A.Flash reliability in production:The expected and the unexpected[C]//Proc of the 14th USENIX Conf on File and Storage Technologies.Berkeley,CA:USENIX Association,2016:67-80
    [5]Ghemawat S,Gobioff H,Leung S T.The Google file system[C]//Proc of the 19th ACM Symp on Operating Systems Principles.New York:ACM,2003:29-43
    [6]Luo Xianghong,Shu Jiwu.Summary of research for erasure code in storage system[J].Journal of Computer Research and Development,2012,49(1):1-11(in Chinese)(罗象宏,舒继武.存储系统中的纠删码研究综述[J].计算机研究与发展,2012,49(1):1-11)
    [7]Facebook.HDFS-RAID[EB/OL].[2017-02-28].https://wiki.apache.org/hadoop/HDFS-RAID
    [8]Google.Colossus,successor to Google file system[OL].[2017-01-29].http://static.googleusercontent.com/media/research.google.com/en/us/university/relations/facultysummit 2010/storage_architecture_and_challenges.pdf
    [9]Luse P,Greenan K.Swift object storage:Adding erasure codes[OL].[2017-01-09].http://www.snia.org/sites/default/files/Luse_Kevin_SNIATutorialSwift_Object_Storage2014_final.pdf
    [10]IBM.IBM cloud object storage[EB/OL].[2016-12-20].https://www.ibm.com/cloud/object-storage
    [11]Plank J S,Greenan K M,Miller E L.Screaming fast Galois field arithmetic using Intel SIMD instructions[C]//Proc of the 11th USENIX Conf on File and Storage Technologies.Berkeley,CA:USENIX Association,2013:298-306
    [12]Rashmi K V,Shah N B,Gu Dikang,et al.A“hitchhiker’s”guide to fast and efficient data reconstruction in erasure-coded data centers[C]//Proc of the 28th ACM Conf on SIGCOMM.New York:ACM,2014:331-342
    [13]Sathiamoorthy M,Asteris M,Papailiopoulos D,et al.XORing elephants:Novel erasure codes for big data[J].Proceedings of the VLDB Endowment,2013,6(5):325-336
    [14]Li Mingqiang,Lee P P C.STAIR codes:A general family of erasure codes for tolerating device and sector failures in practical storage systems[C]//Proc of the 12th USENIXConf on File and Storage Technologies.Berkeley,CA:USENIX Association,2014:147-162
    [15]Plank J S,Blaum M,Hafner J L.SD codes:Erasure codes designed for how storage systems really fail[C]//Proc of the 11th USENIX Conf on File and Storage Technologies.Berkeley,CA:USENIX Association,2013:95-104
    [16]Xia Mingyuan,Saxena M,Blaum M,et al.A tale of two erasure codes in HDFS[C]//Proc of the 13th USENIX Conf on File and Storage Technologies.Berkeley,CA:USENIXAssociation,2015:213-226
    [17]Huang Jianzhong,Liang Xianhai,Qin Xiao,et al.PUSH:Apipelined reconstruction I/O for erasure-coded storage clusters[J].IEEE Transactions on Parallel and Distributed Systems,2015,26(2):516-526
    [18]Rashmi K V,Shah N B,Gu Dikang,et al.A solution to the network challenges of data recovery in erasure-coded distributed storage systems:A study on the Facebook warehouse cluster[C]//Proc of the 5th USENIX Conf on Hot Topics in Storage and File Systems.Berkeley,CA:USENIXAssociation,2013:8-8
    [19]Khan O,Burns R,Plank J,et al.Rethinking erasure codes for cloud file systems:Minimizing I/O for recovery and degraded reads[C]//Proc of the 10th USENIX Conf on File and Storage Technologies.Berkeley,CA:USENIX Association,2012:251-264
    [20]Rashmi K V,Preetum N,Wang Jingyan,et al.Having your cake and eating it too:Jointly optimal erasure codes for I/O,storage,and network-bandwidth[C]//Proc of the 13th USENIX Conf on File and Storage Technologies.Berkeley,CA:USENIX Association,2015:81-94
    [21]Nunes B A A,Mendonca M,Nguyen X N,et al.A survey of software-defined networking:Past,present,and future of programmable networks[J].IEEE Communications Surveys&Tutorials,2014,16(3):1617-1634
    [22]Dimakis A G,Godfrey P B,Wu Yunnan,et al.Network coding for distributed storage systems[J].IEEETransactions on Information Theory,2010,56(9):4539-4551
    [23]Tamo I,Barg A.A family of optimal locally recoverable codes[J].IEEE Transactions on Information Theory,2014,60(8):4661-4676
    [24]Reed I S,Solomon G.Polynomial codes over certain finite fields[J].Journal of the Society for Industrial and Applied Mathematics,1960,8(2):300-304
    [25]Roth R.Introduction to Coding Theory[M].Cambridge,UK:Cambridge University Press,2006
    [26]The P4Language Consortium.P4factory[EB/OL].[2016-12-25].https://github.com/p4lang/p4factory

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700