Parallel strategies for 2D Discrete Wavelet Transform in shared memory systems and GPUs
详细信息    查看全文
  • 作者:V. Galiano (1)
    O. López (1)
    M. P. Malumbres (1)
    H. Migallón (1)
  • 关键词:Wavelet transform ; Image coding ; Parallel algorithms ; CUDA ; GPU ; OpenMP
  • 刊名:The Journal of Supercomputing
  • 出版年:2013
  • 出版时间:April 2013
  • 年:2013
  • 卷:64
  • 期:1
  • 页码:4-16
  • 全文大小:668KB
  • 参考文献:1. Rao K, Yip P (1990) Discrete cosine transform: algorithms, advantages, applications. Academic Press, Boston
    2. ISO (2000) ISO/IEC 15444-1. JPEG2000 image coding system
    3. Said A, Pearlman A (1996) A new, fast and efficient image codec based on set partitioning in hierarchical trees. IEEE Transactions on Circuits, Systems and Video Technology 6(3):243-50 CrossRef
    4. Mallat SG (1989) A theory for multi-resolution signal decomposition: The wavelet representation. IEEE Trans Pattern Anal Mach Intell 11(7):674-93 CrossRef
    5. Sweldens W (1996) The lifting scheme: a custom-design construction of biorthogonal wavelets. Appl Comput Harmon Anal 3(2):186-00 CrossRef
    6. Sweldens W (1998) The lifting scheme: a construction of second generation wavelets. SIAM J Math Anal 29(2):511-46 CrossRef
    7. Chrysafis C, Ortega A (2000) Line-based, reduced memory, wavelet image compression. IEEE Trans Image Process 9(3):378-89 CrossRef
    8. Bao Y, Jay Kuo CC (2001) Design of wavelet-based image codec in memory-constrained environment. IEEE Trans Circuits Syst Video Technol 11(5):642-50 CrossRef
    9. Hsia C-H, Guo J-M, Chiang J-S, Lin C-H (2009) A novel fast algorithm based on SMDWT for visual processing applications. In: IEEE international symposium on circuits and systems, ISCAS 2009, pp?762-65 CrossRef
    10. Wippig D, Klauer B (2011) GPU-based translation-invariant 2d discrete wavelet transform for image processing. Int J Comput 5(2):226-34
    11. Rost RJ (2006) OpenGL? shading language, 2nd edn. Addison-Wesley, Reading
    12. Daubechies I, Sweldens W (1998) Factoring wavelet transforms into lifting steps. J Fourier Anal Appl 4(3):247-69 CrossRef
    13. Shapiro JM (1993) Embedded image coding using zerotrees of wavelet coefficients. IEEE Trans Signal Process 41(12):3445-462 CrossRef
    14. OpenMP Architecture Review Board (2002) OpenMP C and C++ application program interface, version 2.0
    15. Nickolls J, Buck I, Garland M, Skadron K (2008) Scalable parallel programming with CUDA. Queue 6:40-3 CrossRef
    16. Corporation NVIDIA (2010) NVIDIA CUDA C programming guide, version 3.2
  • 作者单位:V. Galiano (1)
    O. López (1)
    M. P. Malumbres (1)
    H. Migallón (1)

    1. Physics and Computer Architecture Department, Miguel Hernández University, 03202, Elche, Spain
  • ISSN:1573-0484
文摘
In this work, we analyze the behavior of several parallel algorithms developed to compute the two-dimensional discrete wavelet transform using both OpenMP over a multicore platform and CUDA over a GPU. The proposed parallel algorithms are based on both regular filter-bank convolution and lifting transform with small implementations changes focused on both the memory requirements reduction and the complexity reduction. We compare our implementations against sequential CPU algorithms and other recently proposed algorithms like the SMDWT algorithm over different CPUs and the Wippig&Klauer algorithm over a GTX280 GPU. Finally, we analyze their behavior when algorithms are adapted to each architecture. Significant execution times improvements are achieved on both multicore platforms and GPUs. Depending on the multicore platform used, we achieve speed-ups of 1.9 and 3.4 using two and four processes, respectively, when compared to the sequential CPU algorithm, or we obtain speed-ups of 7.1 and 8.9 using eight and ten processes. Regarding GPUs, the GPU convolution algorithm using the GPU shared memory obtains speed-ups up to 20 when compared to the CPU sequential algorithm.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700