用户名: 密码: 验证码:
面向大规模海洋数据同化算法的并行实现及优化
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Parallel implementation and optimization of a large scale ocean data assimilation algorithm
  • 作者:万威强 ; 肖俊敏 ; 洪学海 ; 谭光明
  • 英文作者:WAN Wei-qiang;XIAO Jun-min;HONG Xue-hai;TAN Guang-ming;Institute of Computing Technology,Chinese Academy of Sciences;
  • 关键词:海洋数据同化 ; 集合最优插值 ; 区域分解 ; IO代理结点
  • 英文关键词:ocean data assimilation;;ensemble optimal interpolation(EnOI);;domain decomposition;;IO proxy node
  • 中文刊名:JSJK
  • 英文刊名:Computer Engineering & Science
  • 机构:中国科学院计算技术研究所;
  • 出版日期:2019-05-15
  • 出版单位:计算机工程与科学
  • 年:2019
  • 期:v.41;No.293
  • 基金:国家重点研发计划重点专项(2016YFC1401706);; 国家自然科学基金(61802369)
  • 语种:中文;
  • 页:JSJK201905001
  • 页数:8
  • CN:05
  • ISSN:43-1258/TP
  • 分类号:5-12
摘要
海洋数据同化是一种将海洋观测资料融合到海洋数值模式中的有效手段,经过同化的海洋数据更加接近海洋的真实情况,对人类理解和认识海洋具有重要意义。围绕海洋数据同化设计了一种基于区域分解的一般性并行实现方法。在此基础上,提出了一种基于IO代理的新并行算法。首先,IO代理进程负责数据的并行读取;接下来,IO代理进程对数据进行切块,然后将块数据发送给相应的计算进程;当计算进程完成局部数据同化后,IO代理进程负责收集计算进程的同化结果,并将其写入磁盘。该方法的主要优势在于:利用IO代理进程来负责IO,而不是像传统方法那样让所有进程都来参与IO(直接并行IO),这样可以防止大量进程对磁盘的同时访问,有效避免进程排队所导致的等待。在天河二号集群上的测试结果表明,对于1度分辨率的数据同化,在核心数为425时,该并行实现的总运行时间为9.1 s,相对于传统串行程序的加速比接近38倍。此外,对于0.1度分辨率的数据同化,基于IO代理的并行同化算法在使用10 000核时依然具有较好的可扩展性,并且可将其IO时间最大限制在直接并行IO时间的1/9。
        Ocean data assimilation is an effective method to integrate ocean observation data into the ocean numerical model. Assimilated ocean data is closer to the real situation of the ocean, so it is of great significance for human to understand and study the ocean. We design a general parallel implementation method for ocean data assimilation based on the domain decomposition strategy. We further propose a new parallel algorithm based on IO proxy. Firstly, IO proxy processes are in charge of parallel reading of data. Then, they split data into many blocks, and send different blocks to corresponding computation processes. After completion of local data assimilation, IO proxy processes collect local assimilation results from computation processes, and write them into the disk. The main advantage of this parallel method is that IO proxy processes takes charge of IO, rather than allowing all processes to participate in IO(direct parallel IO). This can prevent a large number of processes from accessing the disk simultaneously, thus effectively avoiding the waiting caused by processes queuing. Test results based on Tianhe-2 clusters show that, for the assimilation of data with 1-degree resolution, when there are 425 cores, the total running time of the proposed parallel implementation is 9.1 s, which is nearly 38 times faster than that of traditional serial programs. In addition, for the assimilation of data with 0.1 degree resolution, the parallel assimilation algorithm using IO proxy still has a good scalability on 10,000 cores, and its IO time can be limited to at most 1/9 of the direct parallel IO time.
引文
[1] Wu Xin-rong,Wang Xi-dong,Li Wei,et al.Review of the application of ocean data assimilation and data fusion techniques[J].Journal of Ocean Technology,2015,34(3):97-103.(in Chinese)
    [2] Clark C,Wilson S.An overview of global observing systems relevant to GODAE[J].Oceanography,2009,22(3):22-33.
    [3] Cummings J,Bertino L,Brasseur P,et al.Ocean data assimilation systems for GODAE[J].Oceanography,2009,22(3):96-109.
    [4] Yan Chang-xiang,Xie Ji-ping,Zhu Jiang.An analysis system for rapid estimate of three-dimensional ocean temperature,salinity,and current fields and its application in the gulf of Aden[J].Climatic and Environmental Research,2011,16(4):419-428.(in Chinese)
    [5] Ma Zhai-pu,Jing Ai-qin.Data assimilation method applied in marine science—Its significance,system configuration and development situation[J].Coastal Engineering,2005,24(4):83-99.(in Chinese)
    [6] Miller R N,Ehret L L.Ensemble generation for models of multimodal systems [J].Monthly Weather Review,2002,130(9):2313-2333.
    [7] Keppenne C L,Rienecker M M.Assimilation of temperature into an isopycnal ocean general circulation model using a parallel ensemble Kalman filer [J].Journal of Marine Systems,2003,40(2):363-380.
    [8] Evensen G.The ensemble Kalman filter:Theoretical formulation and practical implementation [J].Ocean Dynamics,2003,53(4):343-367.
    [9] Epstein E S.Stochastic dynamic prediction[J].Tellus A,1969,21(6):739-759.
    [10] Yan Chang-xiang,Zhu Jiang.Choice of ensemble members for ensemble optimal interpolation[J].Climatic and Environmental Research,2011,16(4):452-458.(in Chinese)
    [11] Oke P R,Schiller A,Griffin D A,et al.Ensemble data assimilation for an eddy-resolving ocean model of the Australian region[J].The Quarterly Journal of the Royal Meteorological Society,2010,131(613):3301-3311.
    [12] Whitaker J S,Hamill T M.Ensemble data assimilation without perturbed observations [J].Monthly Weather Review,2002,130(7):1913-1924.
    [13] Nino-Ruiz E D,Sandu A,Deng X W.A parallel implementation of the ensemble Kalman filter based on modified cholesky decomposition[C]//Proc of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems,2015:Article No.4.
    [14] Hunt B R,Kostelich E J,Szunyogh I.Efficient data assimilation for spatiotemporal chaos:A local ensemble transform Kalman filter[J].Physica D Nonlinear Phenomena,2005,230(1):112-126.
    [15] Sakov P,Bertino L.Relation between two common localisation methods for the EnKF[J].Computational Geosciences,2011,15(2):225-237.
    [16] Terasaki K,Sawada M,Miyoshi T.Local ensemble transform Kalman filter experiments with the nonhydrostatic icosahedral atmospheric model NICAM[J].SOLA,2015,11:23-26.
    [17] Hunt B R,Kostelich E J,Szunyogh I.Efficient data assimilation for spatiotemporal chaos:A local ensemble transform Kalman filter[J].Physica D:Nonlinear Phenomena,2007,230(1-2):112-126.
    [18] Robusto C C.The Cosine-Haversine formula[J].American Mathematical Monthly,1957,64(1):38-40.
    [19] Xu W,Lu Y,Li Q,et al.Hybrid hierarchy storage system in MilkyWay-2 supercomputer[J].Frontiers of Computer Science,2014,8(3):367-377.
    [20] Qian Ying-jin.Research on key issues in large scale clustered file system Lustre[D].Changsha:National University of Defense Technology,2011.(in Chinese)
    [21] Lustre manual [EB/OL].[2018-05-15].http://manual.lustre.org.
    [22] Wang Bo,Li Xian-guo,Zhang Xiao.Research on performance optimization of Lustre file system[J].Microcomputer Applications,2011,27(5):31-33.(in Chinese)
    [23] Liang Jun,Nie Rui-hua.Lustre file system based on object storage[J].Computer Engineering and Design,2015,36(6):1666-1670.(in Chinese)
    [24] Chen Si-jian,Wu Qing-bo,Zhou En-qiang.Performance evaluation and analysis of Lustre file system[C]//Proc of the 13rd National Conference of Information Storage,2014:511-516.(in Chinese)
    [25] Luo Hong-bing,Zhang Xiao-xia.Analysis of scalability for MPI collective communication[J].Journal of Frontiers of Computer Science and Technology,2017,11(2):252-261.(in Chinese)
    [26] Wei Chan-juan,Zhang Chun-shui,Liu Jian.A new method of solving inverse matrix based on Cholesky matrix[J].Electronic Design Engineering,2014,22(1):159-161.(in Chinese)
    [27] Rhodes R C,Hurlburt H E,Wallcraft A J,et al.Navy real-time global modeling systems[J].Oceanography,2002,15(1):29-43.附中文参考文献:
    [1] 吴新荣,王喜冬,李威,等.海洋数据同化与数据融合技术应用综述[J].海洋技术学报,2015,34(3):97-103.
    [4] 闫长香,谢基平,朱江.一个快速海洋三维温盐流分析系统及在亚丁湾临近海域的应用[J].气候与环境研究,2011,16(4):419-428.
    [5] 马寨璞,井爱芹.海洋科学中的数据同化方法——意义、结构与发展现状[J].海岸工程,2005,24(4):83-99.
    [10] 闫长香,朱江.集合最优插值中的样本选取[J].气候与环境研究,2011,16(4):452-458.
    [20] 钱迎进.大规模Lustre集群文件系统关键技术的研究[D].长沙:国防科学技术大学,2011.
    [22] 王博,李先国,张晓.Lustre文件系统的性能优化研究[J].微型电脑应用,2011,27(5):31-33.
    [23] 梁军,聂瑞华.面向对象存储的文件系统Lustre[J].计算机工程与设计,2015,36(6):1666-1670.
    [24] 陈四建,吴庆波,周恩强.Lustre文件系统性能评测与分析[C]//第十三届全国信息存储技术学术会议论文集,2004:511-516.
    [25] 罗红兵,张晓霞.MPI集合通信性能可扩展性研究与分析[J].计算机科学与探索,2017,11(2):252-261.
    [26] 魏婵娟,张春水,刘健.一种基于Cholesky分解的快速矩阵求逆方法设计[J].电子设计工程,2014,22(1):159-161.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700