单目摄像头实时视觉定位

英文题名：Real-Time Visual Localization with a Single Camera
作者：徐宁
论文级别：硕士
学科专业名称：控制理论与控制工程
中文关键词：视觉定位 ; Harris-SIFT ; 目标识别 ; 特征跟踪 ; 位姿估计
英文关键词：Visual localization ; Harris-SIFT ; Object recognition ; Pose estimation
学位年度：2008
导师：陈卫东
学科代码：081101
学位授予单位：上海交通大学
论文提交日期：2008-02-01

摘要

基于视觉信息的移动机器人自定位是机器人自主导航的关键技术之一,其难点在于如何提高视觉系统的鲁棒性,以适应变化的自然环境,如何从单个摄像头准确恢复深度信息,以确定机器人自身位姿,以及如何提高算法实时性,以满足机器人自身运动的快速性和灵活性。本文对该问题进行了深入研究,旨在构建一个完整的视觉定位系统,使用单个摄像头采集场景图像,并实时计算相机相对参考路标的三维姿态。
     首先,本文回顾和总结了现有的视觉定位和导航算法,提出了单目摄像头实时定位算法的体系结构。该结构从视觉和图像处理的角度出发,结合了基于不变特征的目标识别、特征跟踪和位姿估计算法。算法先识别场景中的视觉路标,接着实时跟踪已识别路标,同时计算摄像头相对路标的三维位姿。此外,算法充分考虑了三个模块之间的内在联系,通过并行计算,最大限度提高了实时性。
     其次,本文提出了Harris-SIFT特征提取算子,分析了算法原理,指出了它相对SIFT的性能改进和优点。接着,本文详细介绍了基于Harris-SIFT的目标识别系统,包括数据库的建立、特征提取、近似最近邻居匹配、一致性检验、识别评估。该目标识别系统具有较好的鲁棒性、准确性和实时性,是视觉定位的核心,保证定位可以在变化的自然环境中可靠运行。
     然后,本文对跟踪和定位算法进行了研究,分析了识别和跟踪相结合的可行性和意义,阐述了双线程并行计算的设计思想和具体的实现细节。而后,本文介绍了共面POSIT位姿估计算法的原理,以及与跟踪、识别算法的结合。其中,为了得到参考物体特征点的三维坐标,本文设计并使用了逆透视成像模型,需要对摄像机进行标定。
     最后,在上述研究的基础上,本文通过多个实验验证了算法的性能,包括Harris-SIFT与同类特征提取算子的比较,自然环境下的目标识别和图像检索,这些实验表明基于Harris-SIFT的目标识别算法鲁棒性较强,准确性较高,实时性较好。此外,本文使用单个手持USB摄像头采集实时视频流,并运行视觉定位算法,检验定位性能。结果表明,该算法可以同时快速识别场景中的多个自然路标,并实时输出相机相对跟踪的3D位姿,且在定位精度较为可靠,圆满实现了设计要求。
While vision-based self-localization has been the core method for autonomous navigation for mobile robots, this approach suffers from three major difficulties, that are, how to design a robust vision system that could be applied in dynamic natural environment, how to restore the depth information with a single camera in order to estimate the 3D pose of the robot, and how to achieve real-time performance in order to catch up with the high speed and smartness of the moving robots. In this paper, we study deep into this research topic, and present a visual recognition system that could real-timely calculate the relative 3D pose of a hand-held camera with respect to coplanar visual landmarks.
     Firstly, we review the state of arts of current vision localization and navigation algorithms, then present the framework of our real-time visual localization algorithm, which combines object recognition with local invariant features, feature tracking and pose estimation. Firstly it detects features from live video stream, finds objects previously learned off-line, then real-timely tracks the recognized targets across video frames, and at the same time, calculates the 3D pose of the camera with respect to landmarks in the scene. In addition, we have fully considered the intrinsic connections between the three modules, and introduce the concept of parallel computing to maximize the run-time performance.
     Next, we propose our Harris-SIFT feature detector, including principles, merits and improvement compared with SIFT. After that, we details the Harris-SIFT based recognition system, which is composed of a visual landmark database, feature detector, approximate nearest neighbor searching, consistency checking, and evaluation of recognition results. Experiment shows that this recognition system is quite robust and fast, working well in dynamic natural environment.
     Then we move to the tracking and localization algorithm, analyze the possibility and suitability of the combination of recognition and tracking algorithm, and illustrate the idea and implementation details of the parallel computing structure. In the following, we introduce the pose estimation algorithm, named POSIT, and explain the mechanism of the whole localization algorithm. Besides, since the perspective camera model is used to compute the 3D coordinates of features extracted from coplanar landmarks, we also briefly review Zhang’s camera calibration method.
     Finally, a serial of experiments is presented to test and verify the performance of our algorithms. Some of the experiments, say, comparison of Harris-SIFT with its several peers, image retrieval in large image gallery, multiple object recognition in natural environment, illustrate the high robustness, accuracy, and real-time performance of Harris-SIFT based recognition system. In the visual localization experiment, we successfully restored the relative 3D pose of a hand-held camera, which could movie rapidly and arbitrarily in the space. It could be concluded from all these experiments that our visual localization algorithm is quite promising for the application of visual localization with only a single camera.

引文

[1] Nilsson N J and Shakey, the Robot [Technical Report 223],USA, Standford University, 1984
    [2] Desouza G N, Kak A C. Vision for Mobile Robot Navigation: A Survey. Pattern Analysis and Machine Intelligence. IEEE Transactions on, 2002,24(2): 237-267
    [3] H. Moravec. Rover visual obstacle avoidance. In International Joint Conference on Artificial Intelligence, pages 785–790, 1981.
    [4] Kosaka and A. C. Kak. Fast Vision-Guided Mobile Robot Navigation using Model-Based Reasoning and Prediction of Uncertainties. (Invited Paper) Computer Vision, Graphics, and Image Processing -- Image Understanding, pp. 271-329, November 1992.
    [5] C. Thorpe, T. Kanade, and S.A. Shafer. Vision and Navigation for the Carnegie-Mellon Navlab. Proc. Image Understand Workshop, pp. 143-152, 1987.
    [6] C. Thorpe, M.H. Herbert, T. Kanade, and S.A. Shafer. Vision and Navigation for the Carnegie-Mellon Navlab. IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 10, no. 3, pp. 362-372, May 1988.
    [7] Niklas Karlsson, Enrico Di Bernardo, James P. Ostrowski, Luis Goncalves, Paolo Pirjanian, Mario E. Munich: The vSLAM Algorithm for Robust Localization and Mapping. ICRA 2005: 24-29
    [8] S. Se, D. Lowe, and J. Little. Global localization using distinctive visual features. In International Conference on Intelligent Robots and Systems, IROS 2002, Lausanne, Switzerland, pages 226-231, 2002.
    [9] Stephen Se, David G.Lowe, James J.Little. Vision- based global localization and mapping for mobile robots [J]. IEEE Transactions on Robotics, 2005, 21(3): 364-375.
    [10] J. Wang, H. Zha, R. Cipolla. Coarse-to-fine vision-based localization by indexing scale-invariant features[J]. IEEE Transactions. on Systems, Man, and Cybernetics, 2006, 36(2):413~422.
    [11] P.M. Newman. On the Structure and Solution of the Simultaneous Localization and Map Building Problem. PhD dissertation, University of Sydney, 1999.
    [12] M.W.M.G. Dissanayake, P.M. Newman, S. Clark, H.F. Durrant-Whyte, and M. Csorba. A solution to the simultaneous localization and map building (SLAM) problem. IEEE Transactions on Robotics and Automation, 17(3):229–241, 2001.
    [13] A.J. Davison and D.W. Murray. Mobile robot localization using active vision. In Proc 5th European Conf on Computer Vision, Freiburg, Germany, May, pages 809–825. Springer-Verlag, 1998.
    [14] A. J. Davison. Real-time simultaneous localization and mapping with a single camera. In Proc 9th Int Conf on Computer Vision, Nice France, Oct 13-16, 2003, 2003.
    [15] A.J. Davison. Active Search for Real-Time Vision, Proc. 10th Int’l Conf. Computer Vision, 2005.
    [16] J. M. M. Montiel, Javier Civera, Andrew J. Davison: Unified Inverse Depth Parametrization for Monocular SLAM. Robotics: Science and Systems 2006
    [17] Andrew J. Davison, Ian D. Reid, Nicholas Molton, Olivier Stasse: MonoSLAM: Real-Time Single Camera SLAM. IEEE Trans. Pattern Anal. Mach. Intell. 29(6): 1052-1067 (2007)
    [18] Mark Pupilli, Andrew Calway, Particle Filtering for Robust Single Camera Localisation. First International Workshop on Mobile Vision. May 2006.
    [19] Denis Chekhlov, Mark Pupilli, Walterio Mayol-Cuevas and Andrew Calway. Real-Time and Robust Monocular SLAM Using Predictive Multi-resolution Descriptors. LNCS proceedings of the 2nd International Symposium on Visual Computing, November 2006.
    [20] M. Pupilli and A. Calway, Real-Time Camera Tracking Using a Particle Filter, Proc British Machine Vision Conference, 2005.
    [21] D. G. Lowe. Object recognition from local scale-invariant features. In Proc. of the International Conference on Computer Vision (ICCV 1999), (Corfu, Greece), pp. 1150–57, Sept. 1999.
    [22] D. G. Lowe. Distinctive image features from scale-invariant keypoints[J]. International Journal on Computer Vision, 2004,60(2):91~110.
    [23] R. O. Castle, D. J. Gawley, G. Klein, and D. W. Murray. Towards simultaneous recognition, localization and mapping for hand-held and wearable cameras. In Proc Int Conf on Robotics and Automation, Rome, Italy, April 10-14, 2007, pages 4102–4107, 2007.
    [24] R O Castle, D J Gawley, G Klein, and D W Murray. Video-rate recognition and localization for wearable cameras. Proc 18th British Machine Vision Conference, Warwick, Sept 2007.
    [25] B. Williams, G. Klein, and I. Reid. Real-time SLAM Relocalisation. Proc International Confernece on Compputer Vision, Rio de Janeiro, Oct 2007.
    [26] J. Shi, C. Tomasi. Good features to track[A]. Computer Vision and Pattern Recognition[C], 1994.
    [27] Tomasi, C. and Kanade, T. Detection and tracking of point features. Tech. Rept. CMU-CS-91132.Pittsburgh:Carnegie Mellon University School of Computer Science, 1991
    [28] Stan Birchfield. Derivation of Kanade-Lucas-Tomasi Tracking Equation. January 20, A. D. 1997
    [29] Yilmaz, O. Javed, and M. Shah. Object tracking: A survey, ACM Comput. Surv., vol. 38, no. 4, p. 13, 2006.
    [30] Edgar R. Arce-Santana, Jose M. Luna-Rivera, Daniel U. Campos-Delgado, and Ulises Pineda-Rico. Real-Time Vision Tracking Algorithm. IJCV, 2006.
    [31] J.K. Friedman, J.L. Bentley, R.A. Finkel. An algorithm for finding best matches in logarithmic expected time, ACM Trans. on Math. Software, 3(1977), pp. 209--226.
    [32] S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Wu. An optimal algorithm for approximate nearest neighbor searching[J]. ACM, 1998, 45:891~923.
    [33] David M. Mount. ANN Programming Manual. Version 1.1.1, 2006.
    [34] Jeffrey S.Beis, D.G. Lowe. Shape Indexing Using approximate nearest-neighbour search in high-dimensional spaces[A]. Computer Vision and Pattern Recognition[C], 1997:1000~1006.
    [35] Z. Zhang, A flexible new technique for camera calibration, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11):1330-1334, 2000.
    [36] David A. Forsyth,Jean Ponce. Computer Vision: A Modern Approach, 2nd. Pearson, 2003
    [37] R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, 2nd, 2003.
    [38] Jean-Yves Bouguet, Camera Calibration Toolbox for Matlab, http://www.vision.caltech.edu/bouguetj/calib_doc/.
    [39] Intel OpenCV Computer Vision Library (C++), http://www.intel.com/research/mrl/research/opencv/.
    [40] M. A. Fischler, R. C. Bolles. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Comm. of the ACM, Vol 24, pp 381-395, 1981.
    [41] Rafael C. Gonzalez and Richard E. Woods. Digital image processing. Addison Wessley Publishing Company, 1992.
    [42] A. Selinger and R.C. Nelson. Appearance-based object recognition using multiple views. In Computer Vision and Pattern Recognition, volume 1, pages 905–911. Elsevier Science Inc., 2001.
    [43] M. Brown and D. Lowe. Recognising panoramas. In Proceedings of the 9th InternationalConference on Computer Vision, Nice, France, pages 1218–1227, 2003.
    [44] G. Dorko and C. Schmid. Selection of scale-invariant parts for object class recognition. In Proceedings of the 9th International Conference on Computer Vision, Nice, France, pages 634–640, 2003.
    [45] L. Fei-Fei and P. Perona. A Bayesian hierarchical model for learning natural scene categories In Proceedings IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp.524- 531, 2005.
    [46] Li Fei-Fei Fergus, R. Perona, P. One-shot learning of object categories. Transactions on Pattern Analysis and Machine Intelligence, 28(4):594-611, Transactions on Pattern Analysis and Machine Intelligence, April, 2006.
    [47] Tinne Tuytelaars, K. Mikolajzyk. A survey on local invariant features. Unpublished, May, 2006
    [48] H. Murase and S. Nayar. Visual learning and recognition of 3-d objects from appearance International Journal on Computer Vision, 14(1): 5-24, 1995.
    [49] Harris, C., Stephens, M. A combined corner and edge detector. Alvey[A]. Vision Conference[C], 1988:147~151.
    [50] Lindeberg, T. Feature detection with automatic scale selection[J]. International Journal of Computer Vision, 1998, 30(2):79~116.
    [51] L. Bretzner and T. Lindeberg. Feature tracking with automatic selection of spatial scales. Computer Vision and Image Understanding, 71(3):385–392, 1998.
    [52] C. Schmid and R. Mohr. Local grayvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5):530–534, 1997.
    [53] Y. Ke and R. Sukthankar. PCA-SIFT: A more distinctive representation for local image descriptors. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Washington, USA, pages 511–517, 2004.
    [54] Michael Grabner, Helmut Grabner, Horst Bischof. Fast Approximated SIFT. ACCV (1) 2006
    [55] S. Heymann, K. Mller, A. Smolic, B. Froehlich, T. Wiegand. SIFT implementation and optimization for general purpose gpu[A]. 15th International Conference in CentralEurope on Computer Graphics, Visualization and ComputerVision [C].Plzen, Czech Republic, 2007.
    [56] K. Mikolajzyk, C. Schmid. Indexing based on scale invariant interest points. ICCV, 2001.
    [57] K. Mikolajzyk and C. Schmid. An affine invariant interest point detector. In: ECCV.(2002)
    [58] K. Mikolajzyk, A.Zisserman and C. Schmid. Shape recognition with edge-based features. BMVC 2003.
    [59] K. Mikolajczyk and C. Schmid. Scale & Affine Invariant Interest Point Detectors. In International Journal on Computer Vision, 60(1):63–86, 2004.
    [60] H. Bay, T. Tuytelaars, and L. van Gool. SURF: Speeded up robust features. In ECCV, 2006.
    [61] V. Lepetit and P. Fua. Keypoint recognition using randomized trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9):1465–1479, 2006.
    [62] M. Ozuysal, P. Fua, and V. Lepetit. Fast keypoint recognition in ten lines of code. In Proc. IEEE Conference on Computing Vision and Pattern Recognition, 2007.
    [63] Edward Rosten and Tom Drummond. Machine learning for high-speed corner detection, European Conference on Computer Vision, (1):430-443, May 2006.
    [64] K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. In IEEE Transactions on Pattern Analysis and Machine Intelligence, to appear, 2005.
    [65] K. Mikolajzyk, Tinne Tuytelaars, Cordelia Schmid, Andrew Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, Luc Van Gool and Luc Van Gool. A comparison of affine region detectors and descriptors. International journal on computer vision 2005.
    [66] T. Tuytelaars and L. Van Gool. Matching widely separated views based on affine invariant regions. International Journal of Computer Vision, 1(59):61–85, 2004.
    [67] Thi-Thanh-Hai Tran, éric Marchand. Real-time keypoints matching: application to visual servoing. In IEEE Int. Conf. on Robotics and Automation, ICRA'07, Pages 3787-3792, Rome, Italy.
    [68] Murphey, Y. L., Chen, J., Crossman, J. and Zhang, J., A Real-time Depth Detection System using Monocular Vision. In SSGRR conference, 2000.
    [69] Saxena, A., Chung, H., & Ng. A. Learning depth from single monocular images. In NIPS. 18, 2006.
    [70] Ashutosh Saxena, Jamie Schulte, Andrew Y. Ng. Depth Estimation using Monocular and Stereo Cues, In IJCAI, 2007
    [71] J. M. M. Montiel, J. Civera, and A. J. Davison. Unified inverse depth parametrization for monocular SLAM. In Proc. Robotics Science and Systems, 2006.
    [72] D.Dementhon, L.S. Davis. Model-based object pose in 25 lines of code[J]. International Journal on Computer Vision, 1995, 15:123~141.
    [73] Oberkampf D, DeMenton D F, Davis L S. Iterative pose estimation using coplanar feature points[J]. Computer Vision and Image Understanding, 1996, 63(3): 495~511.
    [74] P.Perona. David Lowe’s recognition system. Lecture 5, spring 2005. EE/CNS/CS 148
    [75] Moreels, P. Perona, P. Evaluation of features detectors and descriptors based on 3D objects. ICCV, 800- 807 Vol. 1, 2005.
    [76] John Folkesson, Henrik Christensen. SIFT Based Graphical SLAM on a Packbot. FSR07.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700