基于图像局部不变性特征的机器人目标识别与定位

英文题名：Robot Object Recognize and Location Based on Image Local Invariant Features
作者：范志斌
论文级别：硕士
学科专业名称：计算机科学与技术
中文关键词：机器人视觉 ; 局部不变性 ; SIFT ; SURF ; 摄像机标定 ; RANSAC
英文关键词：Robot Vision ; Local Invariant Features ; SIFT ; SURF ; Camera Calibration ; RANSAC
学位年度：2011
导师：王奇志
学科代码：081203
学位授予单位：北京交通大学
论文提交日期：2011-06-01
答辩委员会主席：王志海

摘要

机器人视觉认知一直是人们研究的热点,它的研究是为了使机器人能够智能的认知周围物体。然而即使是一个十分简单的物体,要使用机器去识别它都是一件十分不容易的事。其中最为关键的莫过于是物体的表示或描述,也就是说,使用什么样的特征才能够区别一个物体与另一物体。最近几年人们对图像局部不变性特征的研究,似乎使我们看到了解决这一问题的希望。图像局部不变性特征,其核心是“不变性”。人类识别一个物体时,不管这个物体是远是近,都能对这个物体进行辨认,这就是所谓的尺度不变性。当这个物体发生旋转时,我们同样可以准确的识别它,这就是所谓的旋转不变性。让机器人也能够像人一样具有这种识别能力就是研究图像局部不变性特征要解决的问题。
     本文对图像的局部不变性特征与机器人的双目摄像机标定进行了详细分析与描述。首先从图像中提取出对尺度旋转、缩放、亮度变化等无关的SIFT和SURF局部特征向量,然后通过运用改进的KD-树算法,对图像间的SIFT和SURF特征点进行匹配,根据图像间的透视变换,确定了模板图物体在场景图像中的位置。再根据张氏摄像机标定算法求取相机的内参数矩阵,运用对极几何原理求出双目相机间的位置关系,最后根据双目视觉的三角测量原理确定了物体在空间中的位置。
     实验结果表明,运用SIFT和SURF特征描述的图像特征,在图像的旋转、缩放、一定遮挡等条件下具有稳定性；运用RANSAC算法求取的物体仿射变换关系对物体在图像中的定位具有一定的可靠性；并且根据相机的标定结果对物体的空间位置定位也比较准确,可以用于机器人的操作。
The robot visual perception has been one of the hottest topics. The main goal of the research enables the robot to become more intelligent when perceiving objects around. Yet even a very simple object, it is a very tough task for robot to recognize it. The most key issue is the representation and description of an object, that is, which kind of features can be used to distinguish between one object and another. In recent years the study of local invariant features of the image makes us see the hope to address this issue. For the local invariant features of the image, the core is "invariant". When human identify an object, whether the object is near or far, it can be identified, which is called scale invariance. When the object is rotated, it also can be accurately recognized, which is called rotational invariance. That make the robot have this reception like people is the goal of studying the local invariant features of an image.
     In this paper, the image of the local invariant features and the binocular cameras of the robot calibration carry out a detailed analysis and description. Firstly, the SIFT and SURF features of the image are extracted, which are invariance to rotation, scale and brightness changes. And then the improved KD-tree algorithm is used to match features between two images. According to perspective transformation between two images, the object position in the scene image is then determined. Thirdly, Zhang's technique is used to calibrate the camera matrix. Then the epipolar geometry and the single camera calibration result are used to calculate the relationship between two cameras. At last, the 3D position of the object is obtained based on the triangulation principle.
     The experimental result indicates that, SIFT and SURF features are extraordinary robustness against most disturbances such as scaling, rotation and occlusion. The affine transform obtained by RANSAC algorithm maintains reliability when locating the object position in the scene image. The 3D position of the object is also accurate and can be used for robot operation.

引文

[1]Marr. D. Vision. San Francisco:W. H. Freeman and Company,1982
    [2]Vinzce M., Ayromlou M., Porweiser M., et.al. Edge projected integration of image and model cues for robust model-based object tracking. International Journal of Robotics Research, vol.20, no.7, pp.533-552,2001
    [3]Eiten W. F., Magnussen B., Vauer. J. B, et.al. Modeling and control for mobile manipulation in everyday environments. In 8th International Symp. Robotics Research,1998
    [4]A. Comport, E. Marchand, and F. Chaumette. A real-time tracker for makerless augmented reality. IEEE Int. Symp. On Mixed and Augmented Reality, pp.36-45,2003
    [5]A. Saxena, J. Driemeyer, and A.Y.Ng. Robotic Grasping of novel objects using vision. International Journal of Robotics Research, vol.27,no.2, pp.157-173,2008
    [6]Smith, C. and Papanikolopoulos, N.P. Vision-Guided Robotic Grasping:Issues and Experiments. IEEE International Conference on Robotics and Automation Minneapolis, Minnesota-April,1996
    [7]V. Lippiello, B. Siciliano, L. Villani. Eye-in-Hand/Eye-to-Hand Multi-Camera Visual Servoing. 44th IEEE Conference Decision and Control, and the European Control Conference,2005
    [8]A. Pinz. Object categorization. Foundations and Trends in Computer Graphics and Vision, vol. 1,no.4.pp.255-353,2006.
    [9]陈卫东,张飞.移动机器人的同步自定位与地图创建研究进展.控制理论与应用.Vol.22,No.3,Jun,2005
    [10]N. Karlsson, E. Bernardo, J. Ostrowski, et al. The VSLAM algorithm for robust localization and mapping. International conference on Robotics and Automation, pp.24-29,2005
    [11]D. Kragic, M. Vincze. Vision for Robotics. Foundations and Trends in Robotics archive, Volume 1 Issue 1, January,2009
    [12]A. Gopalakrishnan. A. Sekmen. Vision based mobile robot learning an navigation. IEEE International Workshop on Robot and Human Interactive Communication, RO-MAN'05, pp.48-53,2005
    [13]T. Kawanishi. H. Murase. S. Takagi. Quick 3D object detection and localization by dynamic active search with multiple active cameras. IEEE Interantional Conference on Pattern Recognition, ICPR'02, pp.605-608,2002
    [14]S. Ekvall. D. Kragic. P. Jensfelt. Object detection and mapping for service robot tasks. Robotica, vol.25, pp.175-187,2007
    [15]D.G Lowe. Object Recognition from Local Scale-Invariant Features. Proc. International Conference on Computer Vision, pp.1150-1157,1999
    [16]Y. Ke and R. Sukthankar, "PCA-SIFT:A More Distinctive Representation for Local Image Descriptors", in Proc. CVPR (2),2004, pp.506-513
    [17]H. Bay, T. Tuytelaars, L. V. Gool. SURF:Speeded Up Robust Features. COMPUTER VISION, Lecture Notes in Computer Science, Vol.3951, P.404-417,2006
    [18]F. Liefhebber, J. Sijs. Vision-based control of the Manus using SIFT. IEEE 10th. International Conference on Rehabilitation Robotics, June 12-15,Noordwijk, The Netherlands,2007
    [19]S.Effendi, R. Jarvis, D. Suter. Robot Manipulation Grasping of Recognized Objects for Assistive Technology Support Using Stereo Vision. Australasian Conference on Robotics, 2008
    [20]D.J. Kim, R. Lovelett, A. Beha. Eye-in-Hand Stereo Visual Servoing of an Assistive Robot Arm in Unstructured Environments.2009 IEEE International Conference on Robotics and Automation, Kobe International Conference Center, Kobe, Japan, May 12-17,2009
    [21]徐奕,周军,周源华.立体视觉技术.计算机工程与应用.2003
    [22]Z. Zhang. A Flexible New Technique for Camera Calibration. Pattern Analysis and Machine Intelligence, IEEE Transactions Vol.22 P.1330-1334,2000
    [23]C. Harris, M. Stephens. A Combined Corner and Edge Detector. Proc. Alvey Vision Conference.1988
    [24]T. Lindberg. Edge detection and ridge detection with automatic scale selection. International Journal of Computer Vision, Vol.30, No.2,1988
    [25]K. Mikolajczyk. C. Schmid. An affine invariant interest point detector. In Proceedings of the 8th International Conference on Computer Vision, Vancouver, Canada,2002
    [26]D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, Vol.60, No.2 pp.91-110,2004
    [27]A. Rosenfeld. Some use of pyramids in Image Processing and Segmentation. Proceedings of the DARPA Imaging Understanding Workshop, pp.112-120.1980
    [28]Brown M, Lowe D G. Invariant features form interest point groups. Proceeding of British Machine Vision Conference,2002
    [29]S. Heymann, K. Mller, A Smolic, et.al. SIFT Implementation and Optimization for General-Purpose GPU. In Proceedings of the 15th International Conference in Central Europe on Computer Graphics, Visualiztion and Computer Vision,2007
    [30]S Se, H Ng, P. jasiobedzki, et.al. Vision based modeling and localization for planetary exploration rovers. Proceedings of International Astronautical Congress,23. Viola, p., Jones, M.,2004
    [31]P. Viola. M Jones. Rapid object detection using a boosted cascade of simple features. In IEEE Conference on Computer Vision and Pattern Recognition,2001
    [32]P. Viola. M Jones. Robust real-time face detection. International Journal of computer vision 57, pp.137-154,2004
    [33]H. Bay, T. Tuytelaars, L. V. Gool. Interactive museum guide:fast and robust recognition of museum objects. Procceding of the First International Workshop on Mobile Vision,2006
    [34]H. Bay, T. Tuytelaars, L. V. Gool. Speeded-Up Robust Features(SURF). Computer Vision and Image Understanding,2008
    [35]张广军.视觉测量.科学出版社,北京,2008
    [36]http://www.vision.caltech.edu/bouguetj/calib_doc/htmls/links.html
    [37]Z. Zhang. A Flexible New Technique for Camera Calibration. Pattern Analysis and Machine Intelligence, IEEE Transactions Vol.22 P.1330-1334,2000
    [38]J. More. The levenberg-marquardt algorithm, implementation and theory. In G. A. Watson, editor, Numerical Analysis, Lecture Notes in Mathematics 630. Springer-Verlag,1977
    [39]马颂德,张正友.计算机视觉——计算理论与算法基础.科学出版社,北京,1998
    [40]Szeliski R. Computer Vision:Algorithms and Applications. Spinger, pp.533-577,2010
    [41]Gary Bradski, Adrian Kaehler.于仕琪,刘瑞祯(译).学习OpenCV[专著]中文版.清华大学出版社,北京，2009,pp：406-498
    [42]R.I. Hartley, Theory and practice of projective rectification. International Journal of Computer Vision 35(1988):115-127
    [43]J. Y. Bouguet. The calibration toolbox for Matlab, example 5:Stereo rectification algorithm(code and instuctions only). http://www.vision.caltech.edu/bouguetj/calib_doc/ htmls/example5.html
    [44]董道国,薛向阳,罗航哉.多维数据索引结构回顾.计算机科学, vol.29, No.3,2002
    [45]刘芳洁,董道国,薛向阳.度量空间中高维索引结构回顾.计算机科学,Vol.30, No.7,2003
    [46]王永明,王贵锦.图像局部不变性特征与描述.国防工业出版社.北京,2010,pp.150-168
    [47]AW. Moore. An introductory tutorial on kd-trees. Extract form Andrew Moore's PhD Thesis: Efficient Memory-based Learning for Robot Control PhD. Thesis; Technical Report No.209. University of Cambridge.1991
    [48]http://en.wikipedia.org/wiki/Kd-tree
    [49]L. Moisan, B. Stival. A probabilistic criterion to detect rigid point matches between two images and estimate the fundamental matrix. International Journal of Computer Vision, vol.57, No3,201-218,2004

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700