模式识别并行算法与GPU高速实现研究

作者：张舒
论文级别：硕士
学科专业名称：信息与通信工程
中文关键词：GPU ; 模式识别 ; 奇异值分解 ; AC多模式匹配 ; KFCM算法
英文关键词：GPU ; Pattern recognition ; SVD ; Aho-Corasick ; KFCM
学位年度：2009
导师：窦衡
学科代码：081001
学位授予单位：电子科技大学
论文提交日期：2009-04-01

摘要

随着多核、众核处理器成为计算设备的主流,模式识别在未来并行系统中的实现将需要对应的并行算法研究作为其理论基础。另一方面,基于GPU的通用计算能够提供强大的计算能力和存储器带宽,同时具有良好的可编程性,较低的成本和较短的开发周期。因此,模式识别并行算法与GPU高速实现研究,对高速并行模式识别系统的建立、完善和推广具有重要的意义。
     本文详细分析了Tesla GPU图形与计算架构和CUDA统一计算设备架构,详细描述了如何对计算任务进行并行分解,并通过CUDA的双层并行编程模型映射到Tesla GPU上。
     在本文的实现部分,以软件开发流程为主线,描述了如何利用CUDA实现模式识别中的三种常用算法:特征提取中的奇异值分解(Singular Value Decomposition,SVD)、近邻法中的核模糊C均值聚类(Kernel-based Fuzzy C-means, KFCM)和AC(Aho-Corasick)多模式匹配算法。
     奇异值分解作为矩阵计算、分析的强大工具之一,在统计分析、信号与图象处理、系统理论和控制中被广泛地应用。本文分析了SVD算法的常用数值算法,并比较了各算法在CUDA实现的可行性,从中选择了单边雅可比算法作为基于CUDA的SVD实现基础。在分析单边雅可比算法不足的基础上,提出了一种改进的SVD算法,并以测试结果说明了改进手段的有效性。
     模糊C均值(FCM, Fuzzy C-Means)算法是模糊聚类最流行和应用最广泛的一种算法,在许多领域都取得了非常成功的效果,核模糊C均值算法(KFCM)在FCM的基础上进行了改进,通过引入核函数提高了类别间的可分性。
     多模式匹配算法在模式识别、生物计算、搜索引擎、病毒防治、入侵检测等领域有许多重要应用。AC算法是目前使用较广的一种多模式匹配算法。
     本文针对KFCM算法和AC算法进行了讨论,给出了它们的数据并行形式和CUDA实现方案,以测试结果验证了并行算法的设计,并与基于CPU和FPGA的其他现有实现方案进行了比较。
As multi-core and many-core processors have become the main stream of computing systems, it is necessery to research the parallel form of pattern recognition algorithms for realizations on future parallel systems. On the other hand, GPU general purpose computing based on CUDA is able to provide strong computing ability and high memory band width, and has a good performance on programmability, cost and development cycle. Accordingly, the research of parallel algorithms for pattern recognition and their CUDA implementations have an important practical significance.
     This dissertation analyzes the Tesla GPU graphics and computing architecture, and the CUDA computing unified device architecture. It described how to decompose a compute work load to parallel form, and map it to Tesla GPU through the leveled programming model of CUDA.
     In the implement part of this dissertation, according to the software development process, it tells how to implement three classic pattern recognition algorithms using CUDA. The algorithms are: Singular Value Decomposition (SVD) for feature extraction, Kernel-based Fuzzy C-Means (KFCM) nearest neighbor algorithm, and the Aho-Corasick multi-pattern matching algorithm.
     SVD is widely used as a powerful tool of matrix analysis and calculation in the fields include but not limited to statistical analysis, signal processing, image processing, system theory and control.This dissertation discusses the numerical algorithms for SVD, and implements the one-sided Jacobi method on GPU. Based on the analysis of the result of one-sided Jacobi method, it gives a improved SVD method, and validate its effectiveness.
     FCM (Fuzzy C-Means) is the most widely used fuzzy clulster algorithm, which has been successfully applied in many fileds. KFCM is a variant of FCM, improved the classifiability of clusters by employing kernel functions.
     Multi-pattern matching has many important applications in the fileds include but not limited to pattern recongniton, bio-computing, search engine, anti virus and invasion detection. AC multi-pattern matching is one of the most widely used multi-pattern matching algorithms.
     The parallel form and CUDA implementation for KFCM and Aho-Corasick algorithm is discussed, validated by test result, and compared with CPU and FPGA implementations.

引文

[1]边肇祺,张学工等.模式识别.北京:清华大学出版社, 2000
    [2] Shameem Akhter, Jason Roberts.多核程序设计技术.北京:电子工业出版社, 2007
    [3] John D. Owens. A Surey of General-Purpose Computation on Graphics Hardware, EUROGRAPHICS 2005
    [4] NVidia, NVidia CUDA Compute Unified Device Architecture progranmming guide, version 1.0. 2007
    [5] David kirk. NVIDIA's GT200-- Inside a Parallel Processor http://www.realworldtech.com/page.cfm?ArticleID=RWT090808195242, 2008
    [6] Vasily Volkov, James W. Demmel. Benchmarking GPUs to Tune Dense Linear Algebra. Conference on High Performance Networking and Computing
    [7] Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman etc. Brook for GPUs: Stream Computing on Graphics Hardware.SIGGRAPH 2004
    [8] AMD, firestream-sdk-whitepaper, 2007
    [9] Viney Bondhugula, Naga Govindaraju, Dinesh Manocha, Fast SVD on Graphics Processors. http://www.cs.unc.edu/~geom/Numeric/svd/,2007
    [10] JUFFA, N. CUDA CUBLAS Library, first ed. NVIDIA Corporation, 2701 San TomanExpressway, Santa Clara, CA 95050, USA, 2007.
    [11] LESSIG, C. Eigenvalue Computation with CUDA. Tech. rep., NVIDIA Corporation, August 2007.
    [12] Sheetal Lahabar, Pinky Agrawal, P. J. Narayanan, High Performance Pattern Recognition on GPU. Proceedings of National Conference on Computer Vision Pattern Recognition Image Processing and Graphics (NCVPRIPG'08), DA-IICT, Gandhinagar, India, pp.154-159, 2008.
    [13] Christian Lessig. An Implementation of the MRRR Algorithm on a Data-Parallel Coprocessor. Technical Report, University of Toronto, 2008
    [14] Wenbin Fang, Ka Keung Lau, Mian Lu. Parallel Data Mining on Graphics Processors. Technical Report HKUSTCS0807 Oct 2008
    [15] Giorgos Vasiliadis, Spiros Antonatos, Michalis Polychronakis. Gnort: High Performance Network Intrusion Detection Using Graphics Processors. In Proceedings of the 11thInternational Symposium On Recent Advances In Intrusion Detection (RAID). Boston, MA, USA, 2008
    [16]刘本永.子空间法雷达目标一维像识别:[博士学位论文],成都,电子科技大学,1999
    [17]周代英.雷达目标一维距离像识别研究:[博士学位论文],成都,电子科技大学,2001
    [18]陈璋鑫.子空间法雷达目标一维像识别算法的FPGA实现研究:[硕士学位论文],成都,电子科技大学,2003
    [19]刘本永,杨万麟.基于正则变换的雷达目标成像识别.系统工程与电子技术.1999,21(3):31-33
    [20] Zhang D Q, Chen S C.Kernel- based fuzzy clustering incorporating spatial constraints for image segmentation.Proceedings of the Second International Conference on Machine Learning and Cybernetics, Xi’an, 2003- 11.
    [21] Aho AV, Corasick MJ. Efficient string matching: An aid to bibliographic search. Communications of ACM, 1975, 18(6):333-340
    [22]佚名.高性能计算:性能与效能孰重孰轻?http://server.zol.com.cn/108/1083937.html
    [23] Erik Lindholm, John Nickolls, Stuart Oberman. NVIDIA Tesla: a unified graphics and computing architecture. Hot chips 2008 March-April, Page(s)39-55.
    [24] C.Xavier, S.S.Iyengar.并行算法导论.北京:机械工业出版,2004
    [25] Ananth Grama, Anshul Gupta, George Karypis, etc.并行计算导论,北京:机械工业出版社, 2005
    [26] JACOBI, C. G. J. U¨ber ein leichtes verfahren die in der theorie der sa¨cula¨rsto¨rungen vorkommenden gleichungen numerisch aufzul¨osen. 51–94.
    [27] M.R.Hetenes. Inversion of matrices by biorthogonalization and related results. JSoc. Indust. Appl. Math 6(1). March, 1958: 51-90
    [28] Golub G H,Luk F T. Singular value decomposition: application and computations. Trans 22nd Conf Army Mathematians, ARO Report.1977. 77(1)(577-605)
    [29] Brent R P, Luk F T.The solution of singular value and symmetric eigenproblems on multiprocessor arrays. Sct Stat Comput. 1985,6:69-84
    [30] Sarang Dharmapurikar, John Lockwood. Fast and Scalable Pattern Matching for Pattern filtering. ACM press, 2005: 183-192
    [31] W.H.Press, S.A.Teukolsky, W.T.Vetterling,B.P.Flannery. C语言数值算法程序大全(第二版)电子工业出版社/ 1995
    [32] DHILLON, I. S. A New O ( n2 ) Algorithm for the Symmetric TridiagonalEigenvalue/Eigenvector Problem. PhD thesis, EECS Department, University of California,Berkeley, 1997
    [33] GROSSER, B., AND LANG, B. Efficient Parallel Reduction to Bidiagonal Form. Parallel Comput. 25, 8 (1999), 969–986.
    [34] GROSSER, B. Ein paralleler und hochgenauer O(n2) Algorithmus f¨ur die bidiagonaler Singul¨arwertzerlegung. PhD thesis, Bergische Universit¨at Wuppertal, Fachbereich Mathematik, Wuppertal, Germany, 2001. In German.
    [35] Bezek J C.Pattern recognition with fuzzy object function algorithms[M].New York: Plenum, 1981.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700