基于FSL数据集的去重性能分析
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Deduplication Performance Analysis Based on FSL Dataset
  • 作者:曹晖 ; 张秦正
  • 英文作者:CAO Hui;ZHANG Qin-zheng;School of Computer Science and Engineering, University of Electronic Science and Technology of China;
  • 关键词:重复数据删除 ; 重删率 ; 元数据 ; 存储
  • 英文关键词:data deduplication;;deduplication ratio;;metadata;;storage
  • 中文刊名:DKDX
  • 英文刊名:Journal of University of Electronic Science and Technology of China
  • 机构:电子科技大学计算机科学与工程学院;
  • 出版日期:2018-07-24
  • 出版单位:电子科技大学学报
  • 年:2018
  • 期:v.47
  • 语种:中文;
  • 页:DKDX201804023
  • 页数:5
  • CN:04
  • ISSN:51-1207/T
  • 分类号:143-147
摘要
重复数据删除技术作为一种数据缩减技术,实现了对高度冗余数据集的压缩功能,可以有效地解决存储系统空间浪费所带来的成本开销问题。相较于过去大多针对小规模静态快照或是覆盖时间较短的快照的研究,该文基于从共享用户文件系统选取的覆盖时间较长的大规模快照,从文件、数据块以及用户的角度研究备份数据集的特征,分析不同数据分块方法、策略下去重性能的优缺点,得到最高的重复数据删除率,为未来的重复数据删除系统设计提出建议。
        As a data reduction technology, the deduplication technology realizes the compression function of highly redundant data sets, and can effectively solve the overhead cost which is caused by the waste of space in the storage system. Compared to the previous studies which were mainly based on small-scale static snapshots or short-coverage snapshots, the highest deduplication ratio can be achieved by using large-scale snapshots with a long-coverage time. The large-scale snapshots are selected from the shared user file system. The characteristics of backup datasets from files, data blocks, and users are studied, and the advantages and disadvantages of different data partitioning methods and strategies are analyzed. The proposed result suggests a reference for future deduplication system design.
引文
[1]GANTZ J F,REINSEL D.Extracting value from chaos[R].[S.l.]:IDC White Paper,2011.
    [2]MCKNIGHT J,ASARO T,BABINEAU B.Digital archiving:End-user survey and market forecast2006-2010[R].Milford:The Enterprise Strategy Group,2006.
    [3]王国华.高效重复数据删除技术研究[D].广州:华南理工大学,2014.WANG Guo-hua.Research on technologies for high-effect data deduplication[D].Guangzhou:South China University of Technology,2014.
    [4]李映刚.重复数据删除技术在图片文件系统中的应用[D].成都:成都理工大学,2013.LI Ying-gang.The application of deduplication technology in picture file system[D].Chengdu:Chengdu University of Technology,2013.
    [5]张宗华,屈英,叶志佳,等.基于多特征匹配和Bloomfilter的重复数据删除算法[J].深圳大学学报,2016,33(5):531-535.ZHANG Zong-hua,QU Ying,YE Zhi-jia,et al.Deduplication based on multi-feature matching and bloom filter[J].Journal of Shenzhen University,2016,33(5):531-535.
    [6]王龙翔,董小社,张兴军,等.内容分块算法中预期分块长度对重复数据删除率的影响[J].西安交通大学学报,2016,50(12):73-78.WANG Long-xiang,DONG Xiao-she,ZHANG Xing-jun,et al.Influence of expected chunk size on deduplication ratio in content defined chunking algorithm[J].Journal of Xi’An Jiaotong University,2016,50(12):73-78.
    [7]敖莉,舒继武,李明强.重复数据删除技术[J].软件学报,2010,21(5):916-929.AO Li,SHU Ji-wu,LI Ming-qiang.Data deduplication techniques[J].Journal of Software,2010,21(5):916-929.
    [8]尚颖丹.面向文件级重复数据删除的稀疏索引技术[D].长沙:国防科学技术大学,2012.SHANG Ying-dan.Sparse indexing for file-level de-duplication[D].Changsha:National University of Defense Technology,2012.
    [9]徐奕奕,唐培和.基于分数阶Fourier变换的云存储系统重复数据删除算法[J].计算机科学,2015,42(7):174-177.XU Yi-yi,TANG Pei-he.Duplicate data remove algorithm of cloud storage system based on fractional fourier transform[J].Computer Science,2015,42(7):174-177.
    [10]吴鹏,史芳芳.删除重复数据的一种数据备份方案[J].通信管理与技术,2012(5):58-60.WU Peng,SHI Fang-fang.A data backup scheme for deleted duplicated data[J].Communications Management and Technology,2012(5):58-60.
    [11]卞琛,于炯,修位蓉.基于回归检测的滑动块重复数据删除算法[J].新疆大学学报,2017,34(3):259-266.BIAN Chen,YU Jiong,XIU Wei-rong.A sliding blocking algorithm with regressrion-checking for duplicate data detection[J].Journal of Xinjiang University,2017,34(3):259-266.
    [12]付印金,肖侬,刘芳.重复数据删除关键技术研究进展[J].计算机研究与发展,2012,49(1):12-20.FU Yin-jin,XIAO Nong,LIU Fang.Research and development on key techniques of data deduplication[J].Journal of Computer Research and Development,2012,49(1):12-20.
    [13]RABIN M.Fingerprinting by random polynomials[R].Cambridge:Technical Report,1981.
    [14]ZHEN S,GEOFF K,SONAM M,et al.A long-term user-centric analysis of deduplication patterns[C]//MASS Storage Systems and Technologies.Santa Clara,CA:IEEE,2016.
    [15]WALLACE G,DOUGLIS F,QIAN H,et al.Characteristics of backup workloads in production systems[C]//Usenix Conference on File and Storage Technologies.Berkeley:USENIX Association,2012:262-289.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700