Efficiently solving tri-diagonal system by chunked cyclic reduction and single-GPU shared memory
详细信息    查看全文
  • 作者:Di Zhao ; Jinhang Yu
  • 关键词:GPU computing ; Parallel algorithm ; GPU shared memory ; Chunked cyclic reduction ; Tri ; diagonal system
  • 刊名:The Journal of Supercomputing
  • 出版年:2015
  • 出版时间:February 2015
  • 年:2015
  • 卷:71
  • 期:2
  • 页码:369-390
  • 全文大小:2,102 KB
  • 参考文献:1. Golub GH, Van Loan CF (1996) Matrix computations. Johns Hopkins University Press, Baltimore
    2. Niemeyer K, Sung C-J (2014) Recent progress and challenges in exploiting graphics processors in computational fluid dynamics. J Supercomput 67(2):528-64
    3. Wang Y et al (2013) A parallel solver for incompressible fluid flows. Procedia Comput Sci 18:439-48 CrossRef
    4. Wei Z et al (2013) Parallelizing alternating direction implicit solver on GPUs. Procedia Comput Sci 18:389-98 CrossRef
    5. Curnier A (1994) Computational methods in solid mechanics. Kluwer Academic, Dordrecht CrossRef
    6. Fung Y, Tong P (2001) Classical and computational solid mechanics. World Scientific, Singapore CrossRef
    7. Bathe KJ (2001) Computational fluid and solid mechanics. Elsevier, Amsterdam
    8. Rylander T, Bondeson A, Ingelstr?m P (2012) Computational electromagnetics. Springer, Berlin
    9. Sheng XQ, Song W (2012) Essentials of computational electromagnetics. Wiley, New York CrossRef
    10. Levy G (2004) Computational finance: numerical methods for pricing financial instruments. Elsevier Butterworth-Heinemann, Oxford
    11. Los CA (2001) Computational finance: a scientific perspective. World Scientific, Singapore
    12. Duan JC, H?rdle W, Gentle JE (2011) Handbook of computational finance. Springer, Berlin
    13. Levy G (2008) Computational finance using C and C#. Elsevier, Amsterdam
    14. Lyuu YD (2002) Financial engineering and computation: principles, mathematics, algorithms. Cambridge University Press, Cambridge
    15. Nguyuen H, Corporation N (2008) GPU Gems 3. Addison Wesley Professional, Reading
    16. Pharr M, Fernando R (2005) GPU Gems 2: programming techniques for high-performance graphics and general-purpose computation. Pearson Addison Wesley Professional, Reading
    17. Hockney RW, Jesshope CR (1988) Parallel computers 2: architecture, programming, and algorithms. A. Hilger, London
    18. Sweet R (1988) A parallel and vector variant of the cyclic reduction algorithm. SIAM J Sci Stat Comput 9(4):761-65 CrossRef
    19. Amodio P, Mastronardi N (1993) A parallel version of the cyclic reduction algorithm on a hypercube. Parallel Comput 19(11):1273-281 CrossRef
    20. Mattor N, Williams TJ, Hewett DW (1995) Algorithm for solving tri-diagonal matrix problems in parallel. Parallel Comput 21(11):1769-782 CrossRef
    21. Stone HS (1975) Parallel tri-diagonal equation solvers. ACM Trans Math Softw 1(4):289-07 CrossRef
    22. Schwandt H (1989) Cyclic reduction for tri-diagonal systems of equations with interval coefficients on vector computers. SIAM J Numer Anal 26(3):661-80 CrossRef
    23. Allmann S, Rauber T, Runger G (2001) Cyclic reduction on distributed shared memory machines. In: Proceedings of ninth Euromicro workshop on parallel and distributed processing, 2001
    24. Bekakos MP, Evans DJ (1993) Parallel cyclic odd–even reduction algorithms for solving Toeplitz tri-diagonal equations on MIMD computers. Parallel Comput 19(5):545-61 CrossRef
    25. Gallopoulos E, Saad Y (1989) A parallel block cyclic reduction algorithm for the fast solution of elliptic equations. Parallel Comput 10(2):143-59 CrossRef
    26. Sweet R (1977) A cyclic reduction algorithm for solving block tri-diagonal systems of arbitrary dimension. SIAM J Numer Anal 14(4):706-20 CrossRef
    27. Seal SK, Perumalla KS, Hirshman SP (2013) Revisiting parallel cyclic
  • 作者单位:Di Zhao (1) (2)
    Jinhang Yu (3)

    1. Center for Cognitive and Brain Science, The Ohio State University, Columbus, OH, 43210, USA
    2. College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
    3. Department of Civil, Environmental and Geodetic Engineering, The Ohio State University, Columbus, OH, 43210, USA
  • 刊物类别:Computer Science
  • 刊物主题:Programming Languages, Compilers and Interpreters
    Processor Architectures
    Computer Science, general
  • 出版者:Springer Netherlands
  • ISSN:1573-0484
文摘
The tri-diagonal system comes from dynamic problems such as fluid simulation, and high efficiency is important for the success of these applications. In this paper, we develop completely GPU shared memory-based chunked cyclic reduction under the constraint of the capacity of the shared memory. Computational results show that GPU shared memory chunked cyclic reduction exhibits high efficiency by Nvidia TITAN with 48k shared memory, and GPU shared memory chunked cyclic reduction can solve a tri-diagonal system with 262,144-by-262,144 coefficient matrix in 1.768?ms. Computational results also show that GPU shared memory chunked cyclic reduction scales well to the sizes of coefficient matrix and the reduced systems. Altogether, since building completely on GPU shared memory, our solver may be faster than existing GPU solvers because of the efficiency of GPU shared memory, though the solubility of our solver is smaller than existing GPU solvers because of the capacity constraint of shared memory, where solubility means the solvable tri-diagonal system with the maximum size of the coefficient matrix by our solver.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700