内容感知存储系统中信息信息生命周期管理关键技术研究

英文题名：Research on Key Technologies of Inforamaton Lifycycle Management in Content Aware Storage System
作者：聂雪军
论文级别：博士
学科专业名称：计算机系统结构
中文关键词：信息生命周期管理 ; 内容感知存储系统 ; 内容元数据 ; 信息整合 ; 内容分类 ; 分级存储 ; 重复数据删除 ; 信息归档
英文关键词：Information Lifecycle Management ; Content Aware Storage System ; Content Metadata ; Information Integration ; Content Classification ; Tiered Storage ; Data De-duplication ; Information Archive
学位年度：2011
导师：周敬利
学科代码：081201
学位授予单位：华中科技大学
论文提交日期：2011-05-01

摘要

随着存储系统智能化需求的不断提高,越来越多的应用层功能开始融入存储系统,例如自主管理,数据安全以及信息检索等。传统的存储系统以块级或对象级数据处理为主,缺乏文件级信息,无法将信息生命周期管理(Information Lifecycle Management, ILM)功能融入存储系统。遵循XAM (eXtensible Access Method)规范的内容感知存储(Content Aware Storage)系统,由于使用内容元数据(Content Metadata)对数据的文件级信息进行传载,因此为ILM融入存储系统提供了基础。
     研究ILM融入内容感知存储系统过程中涉及的关键技术,围绕着内容元数据构建信息整合、内容分类、分级存储、数据备份以及信息归档等ILM数据处理阶段。研究工作主要包括：
     提出并实现了一种基于内容元数据的信息整合方法。针对ILM数据处理需求制定了内容元数据规范,包括内容元数据的定义、提取、表示以及传输。以内容元数据为基础,从外在形式和内部语义两方面实现了非结构化信息数据的整合。设计并实现了支持内容元数据规范的存储系统原型,性能测试表明信息整合提高了数据预处理的速度,同时对存储系统的平均I/O性能影响极小。
     提出并实现了一种面向内容元数据的信息分类算法。针对内容元数据的分类特征数量少但语义质量高的特性,构造了一种基于特征词集合的内容元数据相似度计算模型。该模型根据训练样本中的特征词集合构造相似度矩阵,.并通过对矩阵进行平滑运算计算特征词之间的隐式相关性,以此为基础计算内容元数据的特征矢量。基于特征矢量,采用K-Means算法构造数据分类器。性能测试表明,该算法比传统的数据分类算法有着更高的精确度和互信息,并极大地降低了分类计算的时间。
     提出并实现了一种内容元数据驱动的分级存储模型,包括基于应用需求的分级存储与基于成本需求的分级存储。前者满足信息在备份、归档、安全以及访问控制等应用上的需求,后者侧重于降低单位信息的存储成本同时确保存储系统的I/O性能。提出了一种基于速率控制的自适应数据迁移算法,将数据迁移I/O对存储系统正常I/O的影响降至最低。性能测试表明,内容元数据驱动的分级存储模型能有效满足的信息数据的存储需求,同时不影响存储系统的整体性能。
     提出并实现了一种基于内容特征的重复数据删除算法。针对当前数据备份中重复数据删除算法未考虑不同文件类型的内容在比特值分布上的差异,采用候选边界直方图来表示文件类型的内容特征,并在此基础上对传统重复数据删除算法的关键参数进行优化。算法以降低不同文件类型之间的数据缩减率为代价,换取相同类型文件之间数据缩减率的提高。设计了一种支持变长数据块高效存储的文件系统TDFS。性能测试表明,该算法在特定数据集上对数据缩减率(Reduction Ratio)有较大提高。
     提出并实现了一种基于内容元数据的信息归档模型。通过引入支持OAIS (Open Archival Information System)归档规范的内容元数据标签,实现信息的逻辑保存。提出一种基于磁盘的软件WORM (Write Once Read Many)模型,通过修改磁盘功能划分以及对iSCSI命令的响应行为,实现信息的物理保存。通过对归档文件加密并在保存逾期后销毁密钥,实现了信息的安全销毁,同时提出了一种基于时间窗口的密钥管理机制降低密钥管理复杂度。性能测试表明,基于内容元数据的信息归档模型能有效满足归档信息的功能需求与性能需求。
     实验表明,内容感知存储系统能有效解决传统存储系统中缺乏文件级语义的问题,通过以内容元数据为核心来构建ILM模型中的关键数据处理阶段,不仅能简化ILM融入存储系统的复杂性,同时还能极大提高数据访问性能,满足存储系统的智能化需求。
Intelligent storages need to integrate application layer functions into storage layer, such as self management, data security and information retrieval. However, Information Lifecycle Management (ILM) can not be integrated into traditional storage systems because they lack file-level information which will be needed in various stages of ILM. The Content Aware Storage (CAS), which is based on XAM specification, provides supports for such intergration. By wrapping file-level information into content metadata, CAS can provide complete computing information for data processing in ILM, which provieds the basis for integrating ILM into storage systems.
     The paper proposes several key technologies that are involved in integrating ILM stages into CAS, including information integration, content classification, tiered storage, data backup and information archival. The main work includes:
     Propose an information integration model based on content metadata. Propose content metadata specification based on requirements of ILM, which includes the definition, extraction, representation and transportation of content metadata. The information is integrated in the form of both outer format and inner semantic. Design and develop a prototype of CAS that supports the content metadata specification. The experiment result shows that information integration degrades I/O performance very little.
     Propose a content metadata oriented information classification algorithm. Design a computing model for similarity between content metadata, which overcome the limitation of lacking enough character words. The model constructs a similarity matrix for characteristic words based on the explicit relations in train sample file collection, then calculates the implicit relations by matrix smoothing algorithm and obtains a set of linealy independent vectors, by which the characteristic vectors of content metadata are calculated. The data classifier is constructed based on the characteristic vectors and K-Means clustering algorithm. The experiment result shows that this classification algorithm can achieve higher accuracy and mutual information than traditional classification algorithm, and significantly reduce the computing time.
     Propose a content-metadata-driven tiered storage model, including application requirement based tiered storage and cost requirement based tiered storage. The former is to satisfy application requirements of information, such as backup, archival, security and access control, and the latter is to reduce storage cost, while guarantee the overall I/O performance. Propose an adaptive data migration algorithm based on migration speed control, which minimizes the negative impact of migration I/O on normal I/O. The experiment result shows that the model can effectively guarantee that the tier computing and data migration will not degrade performance of storage system, while reduce the storage cost of information.
     Propose a data de-duplication algorithm based on content characteristics. By introducing candidate chunk boundary histogram, the algorithm takes into account the difference between different file types, and optimizes the key parameters of traditional de-duplication algorithm based on candidate chunk boundary histogram. The key idea of this algorithm is to trade the redundancy among files of different types for that among same types. Propose a file system TDFS to storage the various length chunks. The experiment result shows that the algorithm can improve the data compression ratio on average by 9.0% on some special data sets.
     Propose an information archival model based on content metadata. By introducing content metadata tags that support OAIS specification, the model achieves the logical preservation of information. By modifying disk functions and the response of iSCSI commands, the model achieves a disk-based soft WORM and physical preservation of information. Propose a key based security destruction of information, which encrypts archival information and delete the key when preservation is overdue. Propose a time-based management for encryption keys, which significantly reduces the complexity of keys management. The experiment result shows that the archival model can satisfy the requirements of both function and performance.
     As experiments show, the content aware storage system can effectively resolve the problem of lacking file-level semantics in traditional storage system. By constructing key data processing stages of ILM based on content metadata, the complexity of integrating ILM into storage systems can be greatly reduced, and the data I/O performance can be improved, which satisfies the requirements of intelligent storage systems.

引文

[1]Michael Peterson.Getting started with ILM.Network Computing.2006,15(5):15-20.
    [2]Tony Cotterill, Mark Walters.ILM-More Than Just Another Acronym.Storage.2007, 7(5):12-16.
    [3]Jim Geronaitis.ILM-Controlling The Data Mountain.ITNOW.2005,47(5):6-7.
    [4]Joseph F. Kovar.EMC Melds ECM, ILM, Virtualization Into Single Storage Strategy.Bank Systems & Technology.2005,42(7):31-33.
    [5]Robert M. Losee.Browsing mixed structured and unstructured data.Information Processing and Management.2006,42(2):440-452.
    [6]Greg Goth.A Structure for Unstructured Data Search.IEEE Distributed Systems Online.2007,8(1):13-15.
    [7]Andrew Harbison, Pearse Ryan.Analysing Unstructured Data:Electronic Discovery and Computer Limits.Computers and Law.2008,19(4):8-12.
    [8]Benjamin Zhu, Kai Li, Hugo Patterson. Avoiding the disk bottleneck in the data domain deduplication file system. In:Proceedings of the 6th USENIX Conference on File and Storage Technologies(FAST'08); 2008; San Jose, CA, USA:USENIX; 2008.269-282.
    [9]David A. Patterson, Peter Chen, Garth Gibson, et al. Introduction to redundant arrays of inexpensive disks (RAID). In:Proceedings of the 34th IEEE Computer Society International Conference:Intellectual Leverage 1989; San Francisco, CA, USA: Publ by IEEE; 1989.112-117.
    [10]刘永振.自然辩证法概论(第三版)：大连理工大学出版社；2010年.
    [11]Michael Mesnier, Eno Thereska, Gregory R. Ganger, et al. File classification in self-* storage systems. In:Proceedings of the 2004 International Conference on Autonomic Computing; 2004; New York, NY, United States:IEEE Computer Society, Los Alamitos, CA 90720-1314, United States; 2004.44-51.
    [12]Sage A. Weil, Scott A. Brandt, Ethan L. Miller, et al. Ceph:a scalable, high-performance distributed file system. In:Proceedings of the 2007 ACM/IEEE Conference on Supercomputing(SC'07); 2006; Berkeley, CA, USA:USENIX Assoc; 2006.307-320.
    [13]John D. Strunk, Garth R. Goodson, Michael L. Scheinholtz, et al. Self-securing storage:protecting data in compromised systems. In:Proceedings of the Fourth Symposium on Operating Systems Design and Implementation (OSDI 2000); 2000; Berkeley, CA, USA:USENIX Assoc; 2000.165-179.
    [14]Jehan-Francois Paris, Ahmed Amer. Using shared parity disks to improve the reliability of RAID arrays. In:Proceedings of the 28th International Performance Computing and Communications Conference (IPCCC 2009); 2009; Piscataway, NJ, USA:IEEE; 2009.129-136.
    [15]Jehan-Francois Paris, Ahmed Amer, Darrell D. E. Long, et al. Evaluating the impact of irrecoverable read errors on disk array reliability. In:Proceedings of the IEEE 15th Pacific Rim International Symposium on Dependable Computing (PRDC09); 2009; Piscataway, NJ, USA:IEEE; 2009.379-384.
    [16]Mark W. Storer, Kevin M. Greenan, Ethan L. Miller, et al. POTSHARDS:secure long-term storage without encryption. In:Proceedings of the 2007 USENIX Annual Technical Conference; 2007; Berkeley, CA, USA:USENIX; 2007.143-156.
    [17]Mark W. Storer, Kevin M. Greenan, Ethan L. Miller, et al.POTSHARDS:a secure, recoverable, long-term archival storage system.ACM Transaction on Storage.2009, 5(2):5-39.
    [18]Andrew Leung, Minglong Shao, Timothy Bisson, et al.High-performance metadata indexing and search in petascale data storage systems.Journal of Physics: Conference Series.2008,125(1):1-5.
    [19]Andrew Leung, Ethan L. Miller. Scalable full-text search for petascale file systems. In:Proceedings of the 2008 3rd Petascale Data Storage Workshop (PDSW'08); 2008; Piscataway, NJ, USA:IEEE; 2008.7 pp.
    [20]Aleatha Parker-Wood, Christina Strong, Ethan L. Miller, et al. Security Aware Partitioning For Efficient File System Search. In:Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST 2010); 2010; Piscataway, NJ, USA:IEEE; 2010.1-14.
    [21]Brandon Salmon, Steven W. Schlosser, Lorrie Faith Cranor, et al. Perspective: Semantic data management for the home. In:Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST '09). San Francisco, CA.; 2009:1-16.
    [22]K. Greenan, M. Storer, E. L. Miller, et al. POTSHARDS:storing data for the long-term without encryption. In:Proceedings Third International IEEE Security in Storage Workshop; 2006; Los Alamitos, CA, USA:IEEE Computer Society; 2006. 1-9.
    [23]Andrew W. Leung, Minglong Shao, Timothy Bisson, et al. Spyglass:Fast, Scalable Metadata Search for Large-Scale Storage Systems. In:Proceedings of the 7th USENIX Conference on File and Storage Technologies. San Francisco, CA; 2009:153-166.
    [24]Avani Wildani, Thomas Schwarz, Ethan L. Miller, et al. Protecting against rare event failures in archival systems. In:Proceedings of the 2009 IEEE International Symposium on Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS); 2009; Piscataway, NJ, USA:IEEE; 2009.1-11.
    [25]Avani Wildani, Ethan L. Miller. Semantic data placement for power management in archival storage. In:Proceedings 2010 5th Petascale Data Storage Workshop (PDSW '10); 2010; Piscataway, NJ, USA:IEEE; 2010.1-5.
    [26]Deepavali Bhagwat, Kave Eshghi, Darrell D. E. Long, et al. Extreme binning: scalable, parallel deduplication for chunk-based file backup. In:Proceedings of the 2009 IEEE International Symposium on Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS); 2009; Piscataway, NJ, USA:IEEE; 2009.1-9.
    [27]Liu Chuanyi, Lu Yingping, Shi Chunhui, et al. ADMAD:application-driven metadata aware de-duplication archival storage system. In:Proceeding of the 2008 Fifth IEEE International Workshop on Storage Network Architecture and Parallel I/Os (SNAPI); 2008; Piscataway, NJ, USA:IEEE; 2008.29-35.
    [28]John A. Garrison, A. L. Narasimha Reddy.Umbrella file system:Storage management across heterogeneous devices..ACM Transaction on Storage.2009,5(1): 1-24.
    [29]Mandis Beigi, Murthy Devarakonda, Rohit Jain, et al. Policy-based information lifecycle management in a large-scale file system. In:Proceedings of the 6th IEEE International Workshop on Policies for Distributed Systems and Networks; 2005; Los Alamitos, CA, USA:IEEE Comput. Soc; 2005.139-148.
    [30]Akshat Verma, David Pease, Upendra Sharma, et al. An architecture for lifecycle management in very large file systems. In:Proceedings of the 22nd IEEE/23th NASA Goddard Conference on Mass Storage Systems and Technologies; 2005; Los Alamitos, CA, USA:IEEE Comput. Soc; 2005.160-168.
    [31]SNIA. Information Management -Extensible Access Method (XAM) v1.0 Part 1: Architecture; 2008 July 9,.
    [32]SNIA. Information Management -Extensible Access Method (XAM) v1.0 Part 2: C API; 2008 July 9,.
    [33]SNIA. Information Management -Extensible Access Method (XAM) v1.0 Part 3: Java API; 2008 July 9,.
    [34]Simona Rabinovici-Cohen, Michael E. Factor, Dalit Naor, et al.Preservation DataStores:New storage paradigm for preservation environments.IBM Journal of Research and Development.2008,52(4-5):389-398.
    [35]Sean Quinlan, Sean Dorward. Venti:a new approach to archival storage. In: Proceedings of the Conference on File and Storage Technologies(FAST'02); 2002; Monterey, CA, USA:USENIX Assoc; 2002.89-101.
    [36]Lawrence You, Kristal Pollack, Darrell D. E. Long. Deep Store:an archival storage system architecture. In:Proceedings of the 21st International Conference on Data Engineering; 2005; Los Alamitos, CA, USA:IEEE Comput. Soc; 2005.804-815.
    [37]宋炜,张铭.语义网简明教程.北京：高等教育出版社；2004年.
    [38]Thomas Baker.A grammar of Dublin Core.D-Lib Magazine.2000,6(10):11-23.
    [39]Shigeo Sugimoto.Dublin Core No.1- the outline.Journal of Information Processing and Management.2002,45(4):241-254.
    [40]Shigeo Sugimoto, Thomas Baker, Stuart Weibel. Dublin Core:process and principles. In:Proceedings of the 5th International Conference on Asian Digital Libraries, ICADL 2002; 2002; Berlin, Germany:Springer-Verlag; 2002.24-35.
    [41]Sadamu Takasaka.Standardization of HL7 clinical laboratory system.JALA.2002, 7(5):72-74.
    [42]T10 Technical Committee. SCSI Architecture Model-3 (SAM-3); 2004 September 21,.
    [43]T10 Technical Committee. SCSI Primary Commands-3 (SPC-3); 2005 May 4,.
    [44]T10 Technical Committee. SCSI Block Commands-3 (SBC-3); 2009 November 25,.
    [45]Kaushik Veeraraghavan, Edmund B. Nightingale, Jason Flinn, et al.Qufiles:The right file at the right time.ACM Transactions on Storage.2010,6(3):1-14.
    [46]David Gifford, Pierre Jouvelot, Mark Sheldon, et al. Semantic file systems. In: Proceedings of the 13th ACM Symposium on Operating Systems Principles; 1991; USA; 1991.16-25.
    [47]David Lamb, Daniel Lucchesi. CASPAR:Cultural, artistic and scientific knowledge for preservation, access and retrieval. In:Proceedings of the 6th ACM International Conference on Digital Libraries; 2008; Pittsburgh, PA, United states:Association for Computing Machinery; 2008.456-456.
    [48]Michael Chau, Hsinchun Chen, Jialun Qin, et al. Comparison of two approaches to building a vertical search tool:A case study in the nanotechnology domain. In: Proceedings of the 6th ACM International Conference on Digital Libraries; 2002; Portland, OR, United states:Association for Computing Machinery; 2002. 135-144.
    [49]Philip O'Brien, Tony Abou-Assaleh. Focused ranking in a vertical search engine. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2007; New York, NY, USA:ACM; 2007. 9-12.
    [50]S. K. M. Wong, W. Ziarko, V. V. Raghavan, et al.On modeling of information retrieval concepts in vector spaces.ACM Transactions on Database Systems.1987, 12(2):299-321.
    [51]Prasanna Ganesan, Hector Garcia-Molina, Jennifer Widom.Exploiting hierarchical domain structure to compute similarity.ACM Transactions on Information Systems.2003,21(1):64-93.
    [52]Michael W. Berry, Zlatko Drmac, Elizabeth R. Jessup.Matrices, vector spaces, and information retrieval.SIAM Review.1999,41(2):335-362.
    [53]Claudio Carpineto, Stanislaw Osiski, Giovanni Romano, et al.A survey of web clustering engines.ACM Computing Surveys.2009,41(3):1-38.
    [54]Elisabetta Fersini, Enza Messina, Francesco Archetti.A probabilistic relational approach for web document clustering.Information Processing and Management. 46(2):117-130.
    [55]Smita Nirkhi, K. N. Hande. A survey on clustering algorithm for Web Applications. In:Proceedings of the 2008 International Conference on Semantic Web and Web Services, SWWS 2008; 2008; Las Vegas, NV, United states:CSREA Press; 2008. 124-129.
    [56]Claudio Carpineto, Stefano Mizzaro, Giovanni Romano, et al.Mobile information retrieval with search results clustering:Prototypes and evaluations.Journal of the American Society for Information Science and Technology.2009,60(5):877-895.
    [57]Jonghun Park, Byung-Cheon Choi, Kwanho Kim.A vector space approach to tag cloud similarity ranking.Information Processing Letters.2010,110(12-13):489-496.
    [58]Yonggang Deng, Shankar Kumar, William Byrne.Segmentation and alignment of parallel text for statistical machine translation.Natural Language Engineering.2007, 13(3):235-260.
    [59]Bruno Possas, Nivio Ziviani, Wagner Meira Jr, et al.Set-based vector model:An efficient approach for correlation-based ranking.ACM Transactions on Information Systems.2005,23(4):397-429.
    [60]William-Chandra Tjhi, Lihui Chen. Fuzzy Co-clustering of web documents. In: Proceedings of the 2005 International Conference on Cyberworlds(CW 2005); 2005; Singapore,:Institute of Electrical and Electronics Engineers Computer Society; 2005. 545-551.
    [61]Omar Alonso, Michael Gertz. Clustering of search results using temporal attributes. In:Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2006; New York, NY, USA: ACM; 2006.597-598.
    [62]Hao Chen, Susan Dumais. Bringing order to the web:automatically categorizing search results. In:Proceedings of the 2000 Conference on Human Factors in Computing Systems; 2000; The Hague, Neth:ACM; 2000.145-152.
    [63]Xiaoyong Liu, W. Bruce Croft. Representing clusters for retrieval. In:Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2006; New York, NY, USA:ACM; 2006. 671-672.
    [64]Ren Yong Gong, Fan Dan. CQIG-An improved Web search results clustering algorithm. In:Proceedings of the 7th Web Information Systems and Applications Conference(WISA 2010); 2010; Hohhot, China:IEEE Computer Society; 2010. 75-78.
    [65]Mahesh Visvanathan, Adagarla B. Srinivas, Gerald H. Lushington, et al. Cluster validation:An integrative method for cluster analysis. In:Proceedings of the 2009 IEEE International Conference on Bioinformatics and Biomedicine Workshops; 2009; Washington, DC, United states:IEEE Computer Society; 2009.238-242.
    [66]Jerzy Stefanowski, Dawid Weiss. Carrot2 and language properties in Web search results clustering. In:Proceedings of the 1st International Atlantic Web Intelligence Conference; 2003; Berlin, Germany:Springer-Verlag; 2003.240-249.
    [67]Zhang Yanchun, Xu Guandong. Using Web clustering for Web communities mining and analysis. In:Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence; 2008; Piscataway, NJ, USA:IEEE; 2008.20-31.
    [68]Elisabetta Fersini, Enza Messina, Francesco Archetti.A probabilistic relational approach for web document clustering.Information Processing and Management.2010,46(2):117-130.
    [69]Park Jonghun, Choi Byung-Cheon, Kim Kwanho.A vector space approach to tag cloud similarity ranking.Information Processing Letters.110(12-13):489-496.
    [70]李庆扬,王能超,易大义.数值分析：清华大学出版社；2001年.
    [71]Illhoi Yoo, Xiaohua Hu. A comprehensive comparison study of document clustering for a biomedical digital library MEDLINE. In:Proceedings of the 2006 ACM/IEEE Joint Conference on Digital Libraries; 2006; Chapel Hill, NC, United states:Institute of Electrical and Electronics Engineers Inc.; 2006.220-229.
    [72]Wei Xu, Xin Liu, Yihong Gong. Document Clustering Based On Non-negativc Matrix Factorization. In:Proceedings of the 2003 SIGIR Forum (ACM Special Interest Group on Information Retrieval); 2003; Toronto, Ont., Canada:Association for Computing Machinery; 2003.267-273.
    [73]Ying Zhao, George Karypis. Evaluation of hierarchical clustering algorithms for document datasets. In:Proceedings of the 2002 International Conference on Information and Knowledge Management; 2002; McLean, VA, United states: Association for Computing Machinery; 2002.515-524.
    [74]Zheng-Yu Niu, Dong-Hong Ji, Chew-Lim Tan. Document clustering based on cluster validation. In:Proceedings of the 2004 International Conference on Information and Knowledge Management; 2004; Washington, DC, United states:Association for Computing Machinery; 2004.501-506.
    [75]Xin Liu, Yihong Gong, Wei Xu, et al. Document clustering with cluster refinement and model selection capabilities. In:Proceedings of the 2002 SIGIR Forum (ACM Special Interest Group on Information Retrieval); 2002; Tampere, Finland: Association for Computing Machinery; 2002.191-198.
    [76]Michael Peterson. Information Lifecycle Management:A Vision For The Future: Strategic Research Corp.; 2004 March 3,.
    [77]Hirotoshi Akaike, Kazuhisa Fujimoto, Kenji Miura, et al. Performance Evaluation Of Energy-efficient High-speed Tiered-storage System. In:Proceedings of the 8th IEEE International Conference on Industrial Informatics (INDIN 2010); 2010; Piscataway, NJ, USA:IEEE; 2010.663-670.
    [78]Mark Amelang.ILM, tiered storage and active archive form a powerful trio World Oil.2005,226(11):63-64.
    [79]Zhonglei Fan, Xiangmo Zhao. A tiered storage system architecture for biomedical data sharing. In:Proceedings of the 2010 International Conference on Biomedical Engineering and.Computer Science; 2010 Wuhan, China:IEEE Computer Society; 2010
    [80]A. Stuart.Beating the data retention challenge with tiered storage.Banking Technology.2005,20(12):xii-i.
    [81]Zhao Xiaonan, Li Zhanhuai, Zhang Xiao, et al. Block-level Data Migration in Tiered Storage System. In:Proceedings of the 2010 Second International Conference on Computer and Network Technology (ICCNT 2010); 2010; Los Alamitos, CA, USA: IEEE Computer Society; 2010.181-185.
    [82]Gong Zhang, Lawrence Chiu, Clem Dickey, et al. Automated lookahead data migration in SSD-enabled multi-tiered storage systems. In:Proceedings of the 26th Symposium on Mass Storage Systems and Technologies (MSST2010); 2010; Lake Tahoe, NV, United states:IEEE Computer Society; 2010. IEEE.
    [83]Gong Zhang, Lawrence Chiu, Ling Liu. Adaptive data migration in multi-tiered storage based cloud environment. In:Proceedings of the 2010 IEEE 3rd International Conference on Cloud Computing (CLOUD 2010); 2010; Miami, FL, United states:IEEE Computer Society; 2010.148-155.
    [84]Fan Zhonglei, Zhao Xiangmo. A Tiered Storage System Architecture for Biomedical Data Sharing. In:Proceedings of the 2010 International Conference on Biomedical Engineering and Computer Science; 2010; Piscataway, NJ, USA:IEEE; 2010.4 pp.
    [85]John Gantz, David Reinsel. The Digital Universe Decade-Are You Ready?:IDC.; 2010 May 1,.
    [86]Tim Gibson, Ethan L. Miller, Darrell D. E. Long. Long-term file activity and inter-reference patterns. In:Proceedings of the 1998 CMG 1998; Anaheim, CA, USA:CMG; 1998.976-987.
    [87]Akshat Verma, David Pease, Upendra Sharma, et al. An architecture for lifecycle management in very large file systems. In:Proceedings of the 22nd IEEE/23th NASA Goddard Conference on Mass Storage Systems and Technologies; 2005; Monterey, CA, United states:IEEE Computer Society; 2005.160-168.
    [88]David Nagle, Denis Serenyi, Abbie Matthews. The Panasas ActiveScale Storage Cluster -Delivering scalable high bandwidth storage. In:Proceedings of the 2004 IEEE/ACM SC Conference; 2004; Pittsburgh, PA, United states:Institute of Electrical and Electronics Engineers Inc.; 2004.379-388.
    [89]Ajay Dholakia, Evangelos Eleftheriou, Xiao-Yu Hu, et al.A new intra-disk redundancy scheme for high-reliability RAID storage systems in the presence of unrecoverable errors ACM Transactions on Storage (TOS) 2008,4(1):42.
    [90]Li-Pin Chang, Tei-Wei Kuo.Efficient management for large-scale flash-memory storage systems with resource conservation ACM Transactions on Storage (TOS).2005,1(4):38.
    [91]Beomjoo Seo, Roger Zimmermann.Efficient disk replacement and data migration algorithms for large disk subsystems. ACM Transactions on Storage (TOS) 2005, 1(3):30.
    [92]Landon P. Cox, Christopher D. Murray, Brian D. Noble. Pastiche:making backup cheap and easy. In:Proceedings of the Fifth Symposium on Operating Systems Design and Implementation (OSDI'02); 2002; Boston, MA, USA:USENIX Assoc; 2002.285-298.
    [93]Navendu Jain, Mike Dahlin, Renu Tewari. TAPER:tiered approach for eliminating redundancy in replica synchronization. In:Proceedings of the 4th USENIX Conference on File and Storage Technologies; 2005; San Francisco, CA, USA: USENIX Association; 2005.281-294.
    [94]Deepak R. Bobbarjung, PSuresh Jagannathan, PCezary Dubnicki.Improving duplicate elimination in storage systems ACM Transactions on Storage (TOS).2007, 2(4):424～448.
    [95]Lawrence L.You, Christos Karamanolis. Evaluation of Efficient Archival Storage Techniques. In:Proceedings of the 21st IEEE Symposium on Mass Storage Systems and Technologies(MSST'04); 2004:1-6.
    [96]Udi Manber. Finding similar files in a large file system. In:Proceedings of the Winter 1994 USENIX Conference; 1994; San Francisco, CA, USA:USENIX Assoc; 1994.1-10.
    [97]Rabin. Fingerprinting by random polynomials; 1981.
    [98]Sergey Brin, James Davis, Hector Garcia-Molina. Copy detection mechanisms for digital documents. In:Proceedings of the ACM SIGMOD International Conference on Management of Data; 1995; San Jose, CA, USA:ACM, New York, NY, USA; 1995.398-409.
    [99]Deepak R. Bobbarjung, PSuresh Jagannathan, PCezary Dubnicki.Improving duplicate elimination in storage systems.ACM Transactions on Storage (TOS) 2006, 2(4):25.
    [100]Liu Chuanyi, Ju Dapeng, Gu Yu, et al. Semantic data de-duplication for archival storage systems. In:Proceedings of the 13th Asia-Pacific Computer Systems Architecture Conference (ACSAC'08); 2008; Hsinchu, China:IEEE; 2008.1-9.
    [101]ISO Standard 14721:2003. Space Data and Information Transfer Systems-A Reference Model for an Open Archival Information System [OAIS]:International Organization for Standardization; 2003.
    [102]S. Rabinovici-Cohen, M. E. Factor, D. Naor, et al.Preservation DataStores:new storage paradigm for preservation environments.IBM Journal of Research and Development.2008,52(4-5):389-399.
    [103]Jason Gait.OPTICAL FILE CABINET:A RANDOM-ACCESS FILE SYSTEM FOR WRITE-ONCE OPTICAL DISKS.Computer.1988,21(6):11-22.
    [104]E. W. Williams, T. Kubo.Cross-substitutional alloys of InSb for write-once read-many optical media. Japanese Journal of Applied Physics, Part 2 (Letters).1998, 37(2A):127-128.
    [105]Wang Yongge, Zheng Yuliang. Fast and secure magnetic WORM storage systems. In: Proceedings of the 2nd IEEE International Security in Storage Workshop; 2004; Washington, DC, USA:IEEE Comput. Soc; 2004.11-19.
    [106]U.S. Securities Exchange Commission. Commission guidance to broker-dealers on the use of electronic storage media under the electronic signatures in global and national commerce act of 2000 with respect to rule 17a-4(f):SEC Release No. 34-44238; 2001 May.
    [107]United States Congress. The Sarbanes-Oxley Act (SOX) 17 C.F.R. Parts 228,229 and 249; 2002.
    [108]Zhu Jian-Gang, Luo Yansheng, Ding Juren.Magnetic force microscopy study of edge overwrite characteristics in thin film media.IEEE Transactions on Magnetics.1994, 30(1):4242-4244.
    [109]S. L. Garfinkel, A. Shelat.Remembrance of data passed:a study of disk sanitization practices.IEEE Security & Privacy.2003,1(1):17-27.
    [110]P. Gutmann. Secure deletion of data from magnetic and solid-state memory. In: Proceedings of the 6th Annual USENIX Security Symposium:Focusing on Applications of Cryptography; 1996; Berkeley, CA, USA:USENIX Assoc; 1996. 77-89.
    [111]Z. N. J. Peterson, R. Burns, J. Herring, et al. Secure deletion for a versioning file system. In:Proceedings of the 4th USENIX Conference on File and Storage Technologies; 2005; San Francisco, CA, USA:USENIX Association; 2005. 143-154.
    [112]SNIA. Cloud Data Management Interface v1.0; 2010 April 12.