大规模3D并行分层可扩展矩阵乘法的递阶优化方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Hierarchical optimization method for parallel hierarchical scalable matrix multiplication on large scale platform
  • 作者:卢炼 ; 阳爱民
  • 英文作者:Lu Lian;Yang Aimin;Dept.of Information Engineering,Zhongshan Torch Polytechnic;Cisco Information College,Guangdong University of Foreign Studies;
  • 关键词:大规模平台 ; 并行计算 ; 矩阵乘法 ; 递阶优化
  • 英文关键词:large scale platform;;parallel computing;;matrix multiplication;;hierarchical optimization
  • 中文刊名:JSYJ
  • 英文刊名:Application Research of Computers
  • 机构:中山火炬职业技术学院信息工程系;广东外语外贸大学思科信息学院;
  • 出版日期:2016-07-22 20:46
  • 出版单位:计算机应用研究
  • 年:2017
  • 期:v.34;No.308
  • 语种:中文;
  • 页:JSYJ201706026
  • 页数:5
  • CN:06
  • ISSN:51-1196/TP
  • 分类号:119-123
摘要
为提高大规模平台上可扩展矩阵乘法的并行计算效率,提出一种并行分层可扩展矩阵乘法的递阶优化方法。首先,在可扩展矩阵乘法算法(SMM)算法枢轴行和枢轴列通信研究基础上,利用分层方式在更高等级上对网格进行矩形群划分,实现矩阵乘法的二维计算向三维计算转变,并设计对应的集群内通信和集群间通信过程,实现SMM乘法的递阶并行优化(HSMM);其次,对所提HSMM算法进行理论分析,分情况对其通信成本进行分析和预测,推导出最佳计算成本的集群数选取方式;最后,通过在Grid5000和BlueGene/P测试平台实验,显示所提算法在执行时间和通信时间指标上均要优于对比算法,验证了所提算法有效性和理论分析的正确性。
        In order to further improve the parallel computing efficiency of the extended matrix multiplication on a large scale platform,this paper proposed a hierarchical optimization method for parallel hierarchical scalable matrix multiplication on a large scale platform. Firstly,with research on the extended matrix multiplication algorithm( SMM) pivot algorithm,and the pivot column communication,it used hierarchical manner in the higher level of grid of rectangular group division,and realized transition for matrix multiplication two-dimensional calculation to 3D calculation,and designed the corresponding clusters through letters and inter cluster communication process,which realized SMM multiplication of hierarchical parallel optimization(HSMM); Secondly,it analyzed the HSMM algorithm theoretically,and also analyzed the communication cost,then derived the optimal calculation cost of cluster number selection method; Finally,through experiments on the Grid5000 and BlueGene/P test platform,it shows that the proposed algorithm is better than the contrast algorithm in the execution time and the communication time index,which verify the efficiency of proposed algorithm and the correctness of the theoretical analysis.
引文
[1]Lin Chen,Tao Liang,Kwan H K.Parallel-computing-based implementation of fast algorithms for discrete Gabor transform[J].IET Signal Processing,2015,9(7):546-552.
    [2]Wang Jianwu,Crawl D,Altintas I.Big data applications using workflows for data parallel computing[J].Computing in Science&Engineering,2014,16(4):11-21.
    [3]李文明,叶笑春,张洋.BDSim:面向大数据应用的组件化高可配并行模拟框架[J].计算机学报,2015,38(10):1959-1974.
    [4]Uhrig S,Jahr R,Ungerer T.Advanced architecture optimisation and performance analysis of a reconfigurable grid ALU processor[J].IET Computers&Digital Techniques,2012,6(5):334-341.
    [5]Jiang Wangqiang,Zhang Min,Wei Pengbo.CUDA-based SSA method in application to calculating EM scattering from large two-dimensional rough surface[J].IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,2014,7(4):1372-1382.
    [6]Mezmaz M,Mehdi M,Bouvry P.Solving the three dimensional quadratic assignment problem on a computational grid[J].Cluster Computing,2014,17(2):205-217.
    [7]Fujisaki J,Furuya A,Uehara Y.Micromagnetic simulations of magnetization reversal in misaligned multigrain magnets with various grain boundary properties using large-scale parallel computing[J].IEEE Trans on Magnetics,2014,50(11):704-710.
    [8]Akimova E N,Belousov D V,Misilov V E.Algorithms for solving inverse geophysical problems on parallel computing systems[J].Numerical Analysis and Applications,2013,6(2):98-110.
    [9]徐传福,车永刚,王正华.一种均衡可扩展计算机体系结构分布式模拟方法[J].软件学报,2014,25(8):1844-1857.
    [10]Monteiro E,Vizzotto B,Diniz C,et al.Parallelization of full search motion estimation algorithm for parallel and distributed platforms[J].International Journal of Parallel Programming,2014,42(2):239-264.
    [11]Rebbah M,Slimani Y,Benyettou A,et al.A decentralized fault tolerance model based on level of performance for grid environment[J].Cluster Computing,2016,19(1):13-27.
    [12]Quintin J,Hasanov K,Lastovetsky A.Hierarchical parallel matrix multiplication on large-scale distributed memory platforms[C]//Proc of the 42nd International Conference on Parallel Processing.New York:IEEE Press,2013:754-762.