分布式站点间的跨域文件系统
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Cross-Domain File System for Distributed Sites
  • 作者:徐琪 ; 王聪 ; 程耀东 ; 陈刚
  • 英文作者:XU Qi;WANG Cong;CHENG Yaodong;CHEN Gang;Institute of High Energy Physics, Chinese Academy of Sciences;University of Chinese Academy of Sciences;
  • 关键词:高能物理实验数据 ; 跨域访问 ; 远程文件系统 ; 缓存 ; 高性能
  • 英文关键词:experimental data in High Energy Physics(HEP);;cross-domain access;;remote file system;;cache;;high performance
  • 中文刊名:JSGG
  • 英文刊名:Computer Engineering and Applications
  • 机构:中国科学院高能物理研究所;中国科学院大学;
  • 出版日期:2019-02-22 09:55
  • 出版单位:计算机工程与应用
  • 年:2019
  • 期:v.55;No.927
  • 基金:国家重点研发计划项目(No.2016YFB1000605);; 国家自然科学基金(No.11575223)
  • 语种:中文;
  • 页:JSGG201908002
  • 页数:9
  • CN:08
  • 分类号:7-14+22
摘要
高能物理科学研究大多依托固定站点大科学装置,拥有海量实验数据。因此数据计算往往基于异地站点的海量实验数据。针对这些海量的分布式实验数据,传统的高能物理计算模式中采用了网格的方式进行跨域数据共享,但资源利用率低、响应时间长以及部署维护困难等问题,限制了网格技术在中小型站点间的数据共享。针对高能物理计算环境中,中小型站点间的数据共享问题,以Streaming&Cache为核心思想,设计一种远程文件系统,提出远程数据访问本地化,提供高实时性数据访问模式,实现基于HTTP协议的按需数据传输与管理,拥有数据块散列存储和文件统一化视图管理。与高能物理计算中常用分布式文件系统EOS、Lustre、GlusterFS相比,具有广域网可用性、网络时延不敏感性和高性能数据访问模式。
        Lots of data are produced by large scale scientific facilities in High Energy Physics(HEP)studies. Scientific computing is based on these distributed data. Grid computing technology is used to share data between different sites in traditional way. However, low resource utilization, long response time and difficult operations limit data sharing between small sites. While a cross-domain file system based on streaming & cache is designed for data sharing between small sites in HEP computing. Native access for remote data, quick response, data transmission and management on demand based on HTTP, data block hash and store, uniform file view are implemented. Compared with commonly distributed file systems EOS, GlusterFS and Lustre, availability in WAN, insensitivity to network delay and high-performance data access are performed.
引文
[1]Bonacorsi D,Ferrari T.WLCG service challenges and tiered architecture in the LHC era[C]//IFAE,2006.
    [2]Berman F,Fox G,Hey T,et al.Grid computing:making the global infrastructure a reality[M].[S.l.]:Wiley&Sons,2003:945-962.
    [3]Donovan S,Symposium L,Kleen A,et al.Lustre:building a file system for 1,000-node clusters[C]//Proceedings of the Linux Symposium,2003.
    [4]Davies A,Orsaria A.Scale out with GlusterFS[J].Linux Journal,2013.
    [5]Peters A J,Janyst L.Exabyte scale storage at CERN[J].Journal of Physics:Conference Series,2011,331(5):241-244.
    [6]张丽.流媒体技术大全[M].北京:中国青年出版社,2001.
    [7]程耀东,石京燕,陈刚.高能物理计算环境概述[J].科研信息化技术与应用,2014,5(3):3-10.
    [8]徐琪,程耀东,陈刚.高能物理环境中混合存储系统的设计与优化[J].计算机科学,2017,44(10):75-79.
    [9]Griebel M,Zumbusch G.Hash-storage techniques for adaptive multilevel solvers and their domain decomposition parallelization[J].Contemporary Mathematics,1998,218:271-278.
    [10]陈丛.Hash算法原理及在快速检索中的应用[J].福建电脑,2009,25(11):155-156.
    [11]Xu Q,Cheng Y,Chen G.Design and evaluation of a hybrid storage system in HEP environment[J].Journal of Physics:Conference Series,2017,898(6).
    [12]Byers J W,Luby M,Mitzenmacher M.Accessing multiple mirror sites in parallel:using Tornado codes to speed up downloads[C]//Eighteenth Joint Conference of the IEEE Computer and Communications Societies,1999:275-283.
    [13]Dorigo A,Elmer P,Furano F,et al.XROOTD-a highly scalable architecture for data access[J].WSEAS Transactions on Computers,2005,4(4):348-353.
    [14]Tirumala A,Cottrell L,Dunigan T.Measuring end-to-end bandwidth with Iperf using Web100[C]//Proc of Passive and Active Measurement Workshop,2003.
    [15]Norcutt W.The iozone filesystem benchmark[J].Software User Manual&Documentation,2003.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700