GPU通用计算与基于SIFT特征的图像匹配并行算法研究

英文题名：GPGPU and Image Matching Parallel Algorithm Based on SIFT
作者：年华
论文级别：硕士
学科专业名称：计算机软件与理论
中文关键词：GPU通用计算 ; CUDA并行计算 ; 图像匹配 ; SIFT特征算子
英文关键词：GPGPU ; CUDA ; Parallel Computing ; Image Matching ; SIFT
学位年度：2010
导师：刘西洋
学科代码：081202
学位授予单位：西安电子科技大学
论文提交日期：2010-01-01

摘要

目前,将GPU用于通用计算一直是国内外研究的热点,早期的GPGPU (GPU for general purpose)开发直接使用图形学API编程。这种开发方式难度大,成本高。NVIDIA公司推出的CUDA(Compute Unified Device Architecture,统一计算设备架构)是专门用于GPU通用计算的平台,其简单的编程风格,高效的多线程并行处理模式使得人们在面对计算密集型任务时能够更好地利用到GPU庞大的并行计算资源。
     另一方面,在图像匹配领域,SIFT算法作为一种基于特征点的匹配算法,能够很好的处理两幅图像之间发生平移、旋转、仿射变换情况下的匹配问题。较强的匹配能力和良好的健壮性使得SIFT算法在图像匹配领域有着广泛的应用。
     本文深入地分析了CUDA多线程编程模型的硬件架构和软件体系,详细描述了CUDA程序任务的划分,性能的评估以及CUDA程序的优化策略,并仔细分析对比了CUDA GT200架构和新一代架构Fermi的特点,指出了Fermi架构在设计结构上的改进和性能上的优势。
     在本文的实现部分,详细描述了基于SIFT特征的图像匹配算法在CUDA平台的并行设计与实现,并与CPU上的实现做了对比。实验结果表明,本文所实现的基于CUDA平台的SIFT算法相对于CPU上的实现在性能上能够取得很好的加速效果。
Currently, using GPU for general-purpose computing has been a hot research topic of the world. The early GPGPU programming is used in graphics API development. The disadvantage of this development method is very difficult and costly. CUDA(Compute Unified Device Architecture) is a tool introduced by NVIDIA which is designed for GPGPU. Its simple programming style and efficient multi-threaded processing model make it has more efficiency in using of GPU hardware resources when processing computing-intensive tasks.
     On the other side, as a feature point based matching algorithm, SIFT can process the matching problem between two images with translation, rotation and affine transformation. Strong matching ability and good robustness make the SIFT algorithm has a wide application in image match area.
     In this paper, the hardware architecture and software systems of CUDA are deeply analyzed first; Secondly, tasks dividing, performance and optimization strategy of CUDA program are described; Then GT200 architecture and the next-generation architecture Fermi are compared, and improvements and advantages of Fermi are pointing out.
     Finally, the implementation of SIFT algorithm on CUDA are described. Compared with its implementation on CPU, implementation on CUDA is able to achieve a good speedup.

引文

[1]Shameem Akhter,Jason Roberts.多核程序设计技术.北京：电子工业出版社,2007.
    [2]John D.Owens. A Survey of General-Purpose Computation on Graphics Hardware, EURO GRAPHICS 2005.
    [3]NVIDIA CUDA Compute Unified Device Architecture programming guide, version2.0
    [4]吴恩华.图形处理器用于通用计算的技术、现状及其挑战.软件学报,2004,Vol.15(No.10).p1493.
    [5]NVIDIA,新一代CUDA架构登临http://cn.NVIDIA.com/object/fermi_architecture_cn. html.
    [6]韩东,三维重构中的匹配技术研究：[硕士学位论文],大连理工大学,2006,3-5.
    [7]H.Moravec.Obstacle avoidance and navigation in the real world by a seeing robot rover.Technical Report CMU-RI-TR-3,Carnegie-Mellon University, Robotics Institute, 1980,p.177-187.
    [8]C.J.Harris.and M.Stephens.A combined corner and edge detector.Proc.4th Alvey Vision Conferences,1988,p.147-151.
    [9]Shi J,Tomasi C. Good feature to track. In:IEEE Computer Science, ed. IEEE International Conference on Computer Vision and Pattern Recognition. Berlin:IEEE Computer Society Press,1993,p.593-600.
    [10]J.Crowley.A representation for visual information. PH.D thesis, Carnegie Mellon University,1981,p.2-20.
    [11]T.Lindeberg. Feature detection with automatic scale selection. International Journal of Computer Vision,Vol.30,1998, No.2, p.79-116.
    [12]C.S.K.Mikolajczyk. Indexing based on scale invariant interest points.8th International Conference on Computer Vision, Institute of Electrical and Electronics Engineers Inc,2001,p.525-531.
    [13]D.GLowe, Distinctive image features from scale-invariant keypoints [J],International Journal of Computer Version 60(2)91-110,2004.
    [14]D.G.Lowe, Object recognition from local scale-invariant features[C],International Conference on Computer Version, Corfu, Greece, pp.1150-1157,1999.
    [15]鲁小锁,图像特征提取与匹配技术研究及其在对象识别中的应用：[硕士学位论文],吉林大学,2007.
    [16]SIFT算法http://hi.baidu.com/jack_guangzhou/blog/item/ac930acba0acfe4cf21fe7b 1.html
    [17]Y.Ke and R.Sukthankar. PCA-SIFT:A More Distinctive Representation for Local image Descriptors[C], Proc.Cont Computer version and Pattern Recognition,PP.511-517,2004.
    [18]SIFT/SURF算法的深入剖析——谈SIFT的精妙与不足http://blog.csdn.net/cy513/archi ve/2009/08/05/4414352.aspx.
    [19]David kirk.NVIDIA's GT200-Inside a Parallel Processor http://www.realworldtech.com/page.cfm?ArticleID=RWT090808195242,2008.
    [20]张舒,褚艳丽等.GPU高性能运算之CUDA,北京：中国水利水电出版,2009年10月,第2章,58-60,148-163.
    [21]NVIDIA, NVIDIA, CUDA Best Practices Guide, version2.3.
    [22]NVIDIA, Fermi计算架构白皮书.
    [23]Witkin AP. Scale-space filtering. In Proceedings of the 8th International Joint Conference on Artificial Intelligence Karlsruhe, Germany,1983,p.1019-1023.
    [24]Koenderink J. The structure of images[J]. Biological Cybernetics,1984,50:363-396.
    [25]Lindeberg T. Scale-Space for discrete Signals [J].IEEE Transactions PAMI,1980,207:187-217.
    [26]Babaud J, Witkin A P, Baudin Metal. Uniqueness ofthe Gaussian kernel for scale—pace filtering[J]. IEEE Transactions on PaRem Analysis and Machine Intelligence,1996,8(1): 26-33.
    [27]Lindeberg T. Scale space:A frame work for handling image structures at multiple scales. Proc CERN school of computing. The Netherlands,1996,p.695-702.
    [28]Lindeberg T. Feature detection with automatic scale selection. International Journal of Computer Vision, Vol.30,1998,No.2. p.79-116
    [29]高性能运算社区-GPU编程:http://bbs.hpctech.com/forumdisplay.php?fid=31.
    [30]NVIDIA Cuda Zone:http://cuda.csdn.net/.
    [31]Cuda技术论坛：http://forum.csdn.net/CUDA.
    [32]Rob Hess homepage:http://web.engr.oregonstate.edu/%7Ehess/index.html.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700