Kernel density estimation in accelerators
详细信息    查看全文
  • 作者:Unai Lopez-Novoa ; Alexander Mendiburu ; Jose Miguel-Alonso
  • 关键词:Kernel density estimation ; Performance analysis ; OpenCL ; Many ; core processors ; GPGPU
  • 刊名:The Journal of Supercomputing
  • 出版年:2016
  • 出版时间:February 2016
  • 年:2016
  • 卷:72
  • 期:2
  • 页码:545-566
  • 全文大小:1,231 KB
  • 参考文献:1.Agosta G, Barenghi A, Di Federico A, Pelosi G (2015) Opencl performance portability for general-purpose computation on graphics processor units: an exploration on cryptographic primitives. Concurr Comput Pract Exp 27(14):3633–3660CrossRef
    2.AMD (2013) App opencl programming guide. http://​developer.​amd.​com/​tools/​hc/​AMDAPPSDK/​assets/​AMD_​Accelerated_​Parallel_​Processing_​OpenCL_​Programming_​Guide.​pdf
    3.Cramer T, Schmidl D, Klemm M, an Mey D (2012) Openmp programming on intel xeon phi coprocessors: an early performance comparison. In: Proceedings of the many-core applications research community symposium, pp 38–44
    4.Danalis A, Marin G, McCurdy C, Meredith JS, Roth PC, Spafford K, Tipparaju V, Vetter JS (2010) The scalable heterogeneous computing (shoc) benchmark suite. In: Proceedings of the 3rd workshop on general-purpose computation on graphics processing units, ACM, New York, GPGPU ’10, pp 63–74
    5.Elgammal A, Duraiswami R, Davis L (2003) Efficient kernel density estimation using the fast gauss transform with applications to color modeling and tracking. IEEE Trans Pattern Anal Mach Intell 25(11):1499–1504CrossRef
    6.Fukunaga K (1990) Introduction to statistical pattern recognition, 2nd edn. Academic Press Professional Inc, San DiegoMATH
    7.Jeffers J, Reinders J (2013) Intel Xeon Phi Coprocessor High Performance Programming, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco
    8.Jia H, Zhang Y, Long G, Xu J, Yan S, Li Y (2012) Gpuroofline: a model for guiding performance optimizations on gpus. Euro-Par Parallel Processing, Lecture Notes in Computer Science, vol 7484. Springer, Berlin, pp 920–932
    9.Khronos OpenCL Working Group , Munshi A (ed) (2008) The OpenCL specification. Khronos Group, Beaverton, OR
    10.Kim KH, Kim K, Park QH (2011) Performance analysis and optimization of three-dimensional FDTD on GPU using roofline model. Comput Phys Commun 182(6):1201–1207CrossRef MATH
    11.Kirk DB, WmW Hwu (2010) Programming Massively Parallel Processors: A Hands-on Approach, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco
    12.Lattner C, Adve V (2004) Llvm: a compilation framework for lifelong program analysis transformation. In: Proceedings of the international symposium on code generation and optimization, CGO, pp 75–86
    13.Lee VW, Kim C, Chhugani J, Deisher M, Kim D, Nguyen AD, Satish N, Smelyanskiy M, Chennupaty S, Hammarlund P, Singhal R, Dubey P (2010) Debunking the 100x gpu vs. cpu myth: an evaluation of throughput computing on cpu and gpu. SIGARCH Comput Archit News 38(3):451–460CrossRef
    14.Lopez-Novoa U, Mendiburu A, Miguel-Alonso J (2015a) A survey of performance modeling and simulation techniques for accelerator-based computing. IEEE Trans Parallel Distrib Syst 26(1):272–281
    15.Lopez-Novoa U, Sáenz J, Mendiburu A, Miguel-Alonso J (2015b) An efficient implementation of kernel density estimation for multi-core and many-core architectures. Int J High Perform Comput Appl 29(3):331–347CrossRef
    16.Lopez-Novoa U, Sáenz J, Mendiburu A, Miguel-Alonso J, Errasti I, Esnaola G, Ezcurra A, Ibarra-Berastegi G (2015c) Multi-objective environmental model evaluation by means of multidimensional kernel density estimators: Efficient and multi-core implementations. Environ Model Softw 63:123–136CrossRef
    17.Munshi A, Gaster B, Mattson TG, Fung J, Ginsburg D (2011) OpenCL Programming Guide, 1st edn. Addison-Wesley Professional, USA
    18.Nickolls J, Dally W (2010) The gpu computing era. IEEE Micro 30(2):56–69CrossRef
    19.NVIDIA (2012) Opencl best practices guide. http://​www.​nvidia.​com/​content/​cudazone/​CUDABrowser/​downloads/​papers/​NVIDIA_​OpenCL_​BestPracticesGui​de.​pdf
    20.Pennycook S, Hammond S, Wright S, Herdman J, Miller I, Jarvis S (2013) An investigation of the performance portability of opencl. J Parallel Distrib Comput 73(11):1439–1450CrossRef
    21.Seo S, Lee J, Jo G, Lee J (2013) Automatic opencl work-group size selection for multicore cpus. In: Proceedings of the 22nd international conference on parallel architectures and compilation techniques (PACT), pp 387–397
    22.Sheather SJ (2004) Density estimation. Statist Sci 588–597
    23.Silverman BW (1986) Density estimation for statistics and data analysis. Chapman & Hall, London
    24.Torres Y, Gonzalez-Escribano A, Llanos DR (2013) ubench: exposing the impact of cuda block geometry in terms of performance. J Supercomput 65(3):1150–1163CrossRef
    25.Wang Y, Qin Q, SEE SCW, Lin J (2013) Performance portability evaluation for openacc on intel knights corner and nvidia kepler. In: HPC China 2013
    26.Weissbach R (2006) A general kernel functional estimator with general bandwidth-strong consistency and applications. J Nonparam Stat 18(1):1–12CrossRef MathSciNet MATH
    27.Williams S, Waterman A, Patterson D (2009) Roofline: an insightful visual performance model for multicore architectures. Commun ACM 52(4):65–76CrossRef
  • 作者单位:Unai Lopez-Novoa (1) (2)
    Alexander Mendiburu (1)
    Jose Miguel-Alonso (1)

    1. Department of Computer Architecture and Technology, Intelligent Systems Group, University of the Basque Country UPV/EHU, P. Manuel Lardizabal 1, 20018, San Sebastián, Gipuzkoa, Spain
    2. Deusto Institute of Technology, DeustoTech, University of Deusto, Avenida de las Universidades 24, 48007, Bilbao, Spain
  • 刊物类别:Computer Science
  • 刊物主题:Programming Languages, Compilers and Interpreters
    Processor Architectures
    Computer Science, general
  • 出版者:Springer Netherlands
  • ISSN:1573-0484
文摘
Kernel density estimation (KDE) is a popular technique used to estimate the probability density function of a random variable. KDE is considered a fundamental data smoothing algorithm, and it is a common building block in many scientific applications. In a previous work we presented S-KDE, an efficient algorithmic approach to compute KDE that outperformed other state-of-the-art implementations, providing accurate results in much reduced execution times. Its parallel implementation targeted multi- and many-core processors. In this work we present an OpenCL implementation of S-KDE, targeting modern accelerators in a portable way. We test our implementation on three accelerators from different manufacturers, achieving speedups around \(5\times \) compared to a hand-tuned serial version of S-KDE. We also analyze the performance of the code in these accelerators, to find out to what extent our code exploits their capabilities.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700