增强现实中的三维物体注册方法及其应用研究

英文题名：Research on the 3D Object Registration Method in Augmented Reality and Its Application
作者：徐迟
论文级别：博士
学科专业名称：机械设计及理论
中文关键词：增强现实 ; 计算机视觉 ; 三维物体识别 ; 特征匹配 ; 摄像头位姿估计 ; 透视3点问题(P3P) ; 透视n点问题(PnP)
英文关键词：Augmented Reality (AR) ; Computer Vision (CV) ; 3D Object Recognition ; Feature Matching ; Camera Pose Estimation ; Perspective-3-Point Problem (P3P) ; Perspective-n-Point Problem (PnP)
学位年度：2011
导师：李世其
学科代码：080203
学位授予单位：华中科技大学
论文提交日期：2011-11-01

摘要

增强现实(Augmented Reality)是近年来的一个研究热点,有着广泛的应用前景。与传统的虚拟现实(Virtual Reality)不同,增强现实系统通过注册技术将计算机生成的虚拟信息显示在用户的视野中,从感官上将虚拟信息与真实环境无缝的融为一体,从而增强用户对现实世界的感知能力和交互能力。
     虽然,平面物体的增强现实注册方法在学术界得到了广泛的研究,但是,三维物体的增强现实注册方法仍然存在很大的局限性。现有的基于计算机视觉的三维物体识别与注册方法普遍存在搜索空间大、计算时间长等问题。针对这些问题,本文的研究工作主要包含以下几个方面：
     首先,本文针对无标志物三维物体识别与注册问题,提出了一种基于离散梯度特征的实时三维物体识别方法。该方法将三维物体的外观离散为一系列视图,从视图中提取若干包含丰富的梯度特征信息的子区域,并通过这些子区域组合起来,构建用于三维物体识别的组合模型。在物体识别过程中,将图像中梯度信息转化为离散梯度特征,采用旋转方向二进制编码技术与基于SSE 4.2 CPU指令集的并行运算技术,极大的提高了物体识别的运算效率,从而将运算量巨大的基于视图的三维物体识别方法的执行效率提高到增强现实系统的实时性要求水平(帧率≥15审s)。接着,采用基于机器学习的FAST算法提取图像中的特征点,并通过先进的BRIEF算子对特征点进行描述与匹配。
     然后,本文针对增强现实注册的核心问题摄像头位姿计算,研究了透视三点问题(Perspective 3 Point Problem,简称P3P)与透视n点问题(Perspective n Point Problem,简称PnP),提出了新的理论与方法,极大的提高了摄像头位姿求解算法的实时性与稳定性。(1)针对透视三点问题,提出了一种基于透视相似三角形几何约束的P3P直接解法,减少了方程组中未知数的数量,降低方程复杂度,显著提高了解的数值稳定性与精确性。(2)研究了透视n点问题的2D/3D匹配点对的配置问题,发现了一种新的中间状态“准奇异情况”；提出了一种稳定高效的RPnP算法,有效的解决了准奇异情况下的稳定性退化问题；RPnP是第一个可以在缺乏冗余参考点的情况下(n≤5)获得比迭代算法更精确的解非迭代PnP算法,而且运算效率高,可以高效的处理大量的点集。
     接着,本文针对基于标志物的三维物体注册的实时性问题,提出了一种基于查表(Lookup Table,简称LUT)的标志物注册算法,该算法可以在噪声情况下获得很高的稳定性,同时只需要很少的计算量,尤其适合计算资源有限的移动设备增强现实应用。
     最后,本文针对虚实融合环境下的产品装配应用,我们开发了增强现实装配原型系统,通过增强信息指引用户进行正确的装配操作。该原型系统对本文提出的增强现实三维物体注册方法的有效性进行了验证。
Augmented Reality (AR) technology is a hotspot which has a wide range of applications and broad prospects. Being different from the traditional virtual reality (VR) technology, AR augments the real environment with the virtual contents generated by the computer to enhance the user's perception. The virtual contents and the real environment are seamlessly integrated, and the communication interface between the user and real world is enhanced.
     In augmented reality, comparing with the registration method for planar target, the 3D object registration technology has the problems such as huge searching space and long computational time. Aiming at these problems, the research works of the thesis are as follows:
     Firstly, aiming at the mark-less 3D object recognition and registration problem, a real-time object recognition algorithm based on discrete gradient feature is presented. The aspects of the object are recorded into a series of views, and each view is divided into several information rich sub-regions. The recognition is performed by using the discrete gradient feature. The computational efficiency of the view-based 3D object recognition task is significantly enhanced by the binary encoding and the parallel computing technology, which can meet the requirement of the AR frame rate (≥15fps). And then, a machine learning based FAST algorithm is used to extract the feature points, and the BRIEF operator is used to descript and to match the feature points.
     Secondly, theories and methods are proposed for the stability and efficiency of the camera pose estimation process in the AR registration. (1) A direct solution of Perspective-3-Point problem based on a new geometric constraint named Perspective Similar Triangular (PST) is presented. By constructing the new geometric constraint, the number of the unknown variables of the equation system was reduced, which significantly enhanced the numerical stability and the accuracy of the solution. (2) The configuration of the 2D/3D corresponding point pairs in Perspective-n-Point problem is studied, and a new middle state, we called "quasi-singular", is proposed. A high robust algorithm RPnP is designed to resolve the stability degeneration problem in the quasi-singular case. RPnP is the first non-iterative solution for the PnP problem and can achieve more accurate results than the iterative algorithms when no redundant reference points can be used (n≤5), and large-size point set can be handled efficiently due to the O(n) computational complexity.
     Thirdly, an efficient lookup table (LUT)-based camera pose estimation method is presented. It can achieve high stability in the presence of noise with very little computational time, and it is very suitable for the AR applications on mobile equipments with limited computational resources.
     Finally, aiming at the product assembly application in the virtual-real environment, an AR assembly prototype system is developed. The user is guided by the augmented information to perform the correct assembly operations. The effectiveness of the 3D object registration method in AR is validated by this prototype system.

引文

[1]D. G Lowe, "Distinctive image features from scale-invariant keypoints," International journal of computer vision, vol.60, pp.91-110,2004.
    [2]H. Bay, et al., "Speeded-up robust features (SURF)," Computer Vision and Image Understanding, vol.110, pp.346-359,2008.
    [3]V. Chandrasekhar, et al., "CHoG:Compressed histogram of gradients A low bit-rate feature descriptor," in Computer Vision and Pattern Recognition,2009. CVPR 2009. IEEE Conference on,2009, pp.2504-2511.
    [4]P. Milgram and F. Kishino, "A Taxonomy of Mixed Reality Visual Display," IEICE Transactions on Information Systems, vol.77, pp.1321-1329,1994.
    [5]R. Azuma and T.Ronald, "The Challenge of Making Augmented Reality Work Outdoors In Mixed Reality," presented at the Merging Real and Virtual Worlds,1999.
    [6]Boeing. (1990). Augmented reality application at Boeing. Available: http://www.ipo.tue.nl/homepages/mrauterb/presentations/HCI-history/tsld096.htm
    [7]R. Azuma, "A survey of augmented reality," Teleoperators and Virtual Environments, vol.6, pp.355-385,1997.
    [8]C. Baber and S. Steiner, "Virtual reality and augmented reality as a training tool for assembly tasks," presented at the International Conference on Information Visualisation,1999.
    [9]B. Schwald and B. Laval, "An Augmented Reality System for Training and Assistance to Maintenance in the Industrial Context," Journal of WSCG, vol.11, pp.425-432, 2001.
    [10]B. Schwald. (2005). STARMATE:Using augmented reality technology for computer guided maintenance of complex mechanical elements. Available: http://www.scientificcommons.org
    [11]VTT. (2006). Augmented Assembly. Available:http://www.vtt.fi/proj/augasse/?lang=en
    [12]W. Barfield and T. Caudell, Fundamentals of Wearable Computers and Augmented Reality:Lawrence Erlbaum Associates,2001.
    [13]D. Reiners and D. Stricker, "Augmented Reality for Construction Tasks:Doorlock Assembly," presented at the The 1st International Workshop on Augmented Reality, San Francisco,1998.
    [14]R. Kruger and W. Thompson, "A technical and economic assessment of computer vision for industrial inspection and robotic assembly," 1981.
    [15]哈涌刚and周雅,“用于增强现实的头盔显示器的设计,”光学技术，vol.26,pp.350-353,2000.
    [16]陈靖,“视频增强现实系统及其核心技术的研究,”博士学位论文,北京理工大学,北京,2002.
    [17]周雅,et al.,“增强现实系统光照模型建立研究,”中国图象图形学报,vol.9,pp.969-974,2004.
    [18]王涌天,et al.,“基于增强现实技术的圆明园现场数字重建,”科技导报,vol.24,pp.36-40,2006.
    [19]珏艳华,et al.,“基于增强现实的外科手术导航技术,”中国医疗器械,vol.28,pp.50-54,2004.
    [20]孙敏,et al.,"增强现实地理信息系统,”北京大学学报,vol.40,pp.906-913,2004.
    [21]熊友军,et al.,"跟踪注册的增强现实技术研究,”计算机应用研究，pp.81-83,2005.
    [22]常勇and何宗宜,“基于ARToolkit的地下管网增强现实研究,”武汉大学学报,vol.30,pp.345-348,2005.
    [23]E. Tola, et al., "A Fast Local Descriptor for Dense Matching," in Conference on Computer Vision and Pattern Recognition, Alaska, USA,2008.
    [24]Graz. (2009). Handheld Augmented Reality project. Available: http://studierstube.icg.tu-graz.ac.at/handheld-ar/
    [25]M. Riesenhuber and T. Poggio, "Models of object recognition," nature neuroscience, vol.3, pp.1199-1204,2000.
    [26]M. Ulrich, et al., "Cad-based recognition of 3d objects in monocular images," in International Conference on Robotics and Automation,2009, pp.1191-1198.
    [27]K. Mikolajczyk and C. Schmid, "A performance evaluation of local descriptors," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.27, pp.1615-1630, 2005.
    [28]T. Tuytelaars and C. Schmid, "Vector quantizing feature space with a regular lattice," in International Conference on Computer Vision,2007.
    [29]S. Winder, et al., "Picking the best daisy," in Conference on Computer Vision and Pattern Recognition,2009.
    [30]M. Calonder, et al., "Compact signatures for high-speed interest point description and matching," in Computer Vision,2009 IEEE 12th International Conference on,2009, pp.357-364.
    [31]G. Takacs, et al., "Unified real-time tracking and recognition with rotation-invariant fast features," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR-2010,2010.
    [32]M. Calonder, et al, "BRIEF:binary robust independent elementary features," in Computer Vision--ECCV 2010,2010, pp.778-792.
    [33]V. Lepetit, et al., "Randomized Trees for Real-Time Keypoint Recognition," in IEEE Conference on Computer Vision and Pattern Recognition,2005, pp.775-781.
    [34]M. Ozuysal, et al., "Fast keypoint recognition in ten lines of code," in IEEE Conference on Computer Vision and Pattern Recognition,2007, pp.1-8.
    [35]M. Ozuysal, et al., "Fast keypoint recognition using random ferns," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, pp.448-461,2009.
    [36]M. S. Costa and L. G. Shapiro, "3D Object Recognition and Pose with Relational Indexing," Computer Vision and Image Understanding, vol.79, pp.364-407,2000.
    [37]P. David and D. DeMenthon, "Simultaneous pose and correspondence determination using line features," in Computer Vision and Pattern Recognition,2003, pp.424-431.
    [38]O. Carmichael and M. Hebert, "Shape-based recognition of wiry objects," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.26, pp.1537-1552, 2004.
    [39]P. David, et al, "SoftPOSIT: Simultaneous pose and correspondence determination," International journal of computer vision, vol.59, pp.259-284,2004.
    [40]K. Mikolajczyk, et al., "Shape recognition with edge-based features," in British Machine Vision Conference,2003.
    [41]P. David and D. DeMenthon, "Object recognition in high clutter images using line features," in International Conference on Computer Vision,2005, pp.1581-1588.
    [42]R. Strzodka, et al., "A graphics hardware implementation of the generalized Hough transform for fast object recognition, scale, and 3d pose detection," in International Conference on Image Analysis and Processing,2003, pp.188-193.
    [43]R. J. Campbell and P. J. Flynn, "A survey of free-form object representation and recognition techniques," Computer Vision and Image Understanding, vol.81, pp. 166-210,2001.
    [44]O. Faugeras, et al., "Why aspect graphs are not (yet) practical for computer vision," CVGIP:Image Understanding, vol.55, pp.212-218,1992.
    [45]A. Laurentini, "Topological recognition of polyhedral objects from multiple views," Artificial Intelligence, vol.127, pp.31-55,2001.
    [46]H. Murase and S. K. Nayar, "Visual learning and recognition of 3-D objects from appearance," International journal of computer vision, vol.14, pp.5-24,1995.
    [47]H. Borotschnig, et al., "Appearance-based active object recognition," Image and Vision Computing, vol.18, pp.715-727,2000.
    [48]C. von Bank and D. M. G. C. Wiohler, "A visual quality inspection system based on a hierarchical 3d pose estimation algorithm," in Pattern recognition:24th DAGM symposium, Zurich, Switzerland, September 16-18,2002; proceedings,2003, p.179.
    [49]ARTag. (2009). ARTag Home Page. Available:http://www.artag.net/
    [50]ARToolKit. (2003). ARToolKit Home Page. Available: http://www.hitl.washington.edu/artoolkit/
    [51]D. Kalkofen, et al., "Interactive Focus and Context Visualization for Augmented Reality," presented at the International Symposium on Mixed and Augmented Reality, 2007.
    [52]M. Fiala, "ARTag, a fiducial marker system using digital techniques," presented at the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005.
    [53]X. Zhang, et al., "Visual marker detection and decoding in AR systems:A comparative study," presented at the International Symposium on Mixed and Augmented Reality, 2002.
    [54]S. Malik, et al., "Robust 2d tracking for real-time augmented reality," 2002.
    [55]C. Lu, "Fast and globally convergent pose estimation from video images," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.22,2000.
    [56]G. Schweighofer and A. Pinz, "Robust pose estimation from a planar target," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.28, pp.2024-2030, 2006.
    [57]V. Lepetit, et al., "EPnP:An Accurate O (n) Solution to the PnP Problem," International journal of computer vision, vol.81, pp.155-166,2009.
    [58]M. A. Fischler and R. C. Bolles, "Random sample consensus:a paradigm for model fitting with applications to image analysis and automated cartography," Communications of the ACM, vol.24, pp.381-95,1981.
    [59]J. Grunert, "Das Pothenotische Problem in erweiterter Gestalt nebst Uber seine Anwendungen in der Geodasie," Grunerts Archiv fur Mathematik und Physik, vol.1, pp.238-248,1841.
    [60]R. M. Haralick, et al., "Review and analysis of solutions of the three point perspective pose estimation problem," International journal of computer vision, vol.13, pp. 331-356,1994.
    [61]D. A. Forsyth and J. Ponce, Computer vision:a modern approach:Prentice Hall Professional Technical Reference,2002.
    [62]J. C. McGlone, et al., Manual of photogrammetry:American Society for Photogrammetry and Remote Sensing,2004.
    [63]J. Xiao, et al., "Geo-Based Aerial Surveillance Video Processing For Scene Understanding And Object Tracking," International Journal of Pattern Recognition and Artificial Intelligence, vol.23, pp.1285-1307,2009.
    [64]M. Abidi and T. Chandra, "A new efficient and direct solution for pose estimation using quadrangular targets:algorithm and evaluation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.17, pp.534-538,1995.
    [65]N. Barnes and Z. Liu, "Vision Guided Circumnavigating Autonomous Robots," International Journal of Pattern Recognition and Artificial Intelligence, vol.14, pp. 689-714,2000.
    [66]X. S. Gao, et al., "Complete solution classification for the perspective-three-point problem," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.25, pp.930-943,2003.
    [67]W. J. Wolfe, et al., "The perspective view of three points," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.13, pp.66-73,1991.
    [68]D. DeMenthon and L. S. Davis, "Exact and approximate solutions of the perspective-three-point problem," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.14, pp.1100-1105,1992.
    [69]L. Quan and Z. Lan, "Linear n-point camera pose determination," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.21, pp.774-780,1999.
    [70]B. Triggs, et al., "Camera Pose Revisited:New Linear Algorithms," Mach. Intell, vol. 16, pp.802-808,2002.
    [71]S. Finsterwalder and W. Scheufele, "Das Ruckwartseinschneiden im Raum," Sebastian Finsterwalder zum, vol.75, pp.86-100,1937.
    [72]E. Merritt, "Explicit three-point resection in space," Photogrammetric Engineering, vol.15, pp.649-655,1949.
    [73]S. Linnainmaa, et al., "Pose determination of a three-dimensional object using triangle pairs," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.10, pp. 634-647,1988.
    [74]M. Bujnak, et al., "A general solution to the P4P problem for camera with unknown focal length," presented at the Computer Vision and Pattern Recognition,2008.
    [75]R. Hartley and A. Zisserman, Multiple view geometry in computer vision:Cambridge University Press New York, NY, USA,2003.
    [76]I. Skrypnyk and D. Lowe, "Scene Modelling, Recognition and Tracking with Invariant Image Features," in Proceedings of the 3rd IEEE/ACM International Symposium on Mixed and Augmented Reality,2004, p.119.
    [77]V. Lepetit and P. Fua, "Keypoint recognition using randomized trees," IEEE Transactions on Pattern Analysis and Machine Intelligence, pp.1465-1479,2006.
    [78]H. Araujo, et al., "A Fully Projective Formulation to Improve the Accuracy of Lowe's Pose-Estimation Algorithm," Computer Vision and Image Understanding, vol.70, pp. 227-238,1998.
    [79]L. Zhi and J. Tang, "A complete linear 4-point algorithm for camera pose determination," AMSS, Academia Sinica, vol.21,2002.
    [80]B. Triggs, "Camera pose and calibration from 4 or 5 known 3d points," in International conference on computer vision,1999, p.278.
    [81]Y. Abdel-Aziz and H. Karara, "Direct linear transformation from comparator coordinates into object space coordinates in close-range photogrammetry," Amer. Soc. Photogrammetry, pp.1-18,1971.
    [82]A. Ansar and K. Daniilidis, "Linear pose estimation from points or lines," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.25, pp.578-589, 2003.
    [83]P. Fiore, et al., "Efficient linear solution of exterior orientation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.23, pp.140-148,2001.
    [84]G Schweighofer and A. Pinz, "Globally optimal o (n) solution to the pnp problem for general camera models," in Proceedings of the British Machine Vision Conference, 2008.
    [85]D. DeMenthon and L. Davis, "Model-based object pose in 25 lines of code," International journal of computer vision, vol.15, pp.123-141,1995.
    [86]R. Horaud, et al., "Object pose:The link between weak perspective, paraperspective, and full perspective," International journal of computer vision, vol.22, pp.173-189, 1997.
    [87]Z. Zhang, "A flexible new technique for camera calibration," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.22, pp.1330-1334,2000.
    [88]S. Hinterstoisser, et al., "Dominant orientation templates for real-time detection of texture-less objects," presented at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2010.
    [89]C. Harris and M. Stephens, "A combined corner and edge detector," presented at the Alvey Vision Conference,1988.
    [90]J. Shi and C. Tomasi, "Good features to track," presented at the IEEE Conference on Computer Vision and Pattern Recognition,1994.
    [91]K. Mikolajczyk and C. Schmid, " Indexing based on scale invariant interest points," presented at the IEEE International Conference on Computer Vision,2001.
    [92]K. Mikolajczyk and C. Schmid, "An afine invariant interest point detector," presented at the European Conference on Computer Vision,2002.
    [93]E. Rosten, et al., "Faster and better: a machine learning approach to corner detection," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, pp.105-119, 2009.
    [94]D. Oberkampf, et al., "Iterative pose estimation using coplanar feature points," Computer Vision and Image Understanding, vol.63, pp.495-511,1996.
    [95]W. H. Press, et al., Numerical recipes:the art of scientific computing:Cambridge Univ Pr,2007.
    [96]S. Umeyama, "Least-squares estimation of transformation parameters between two point patterns," IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 376-380,1991.
    [97]M. L. Yuan, et al., "A generalized registration method for augmented reality systems," Computers& Graphics, vol.29, pp.980-997,2005.
    [98]D. Claus and A. W. Fitzgibbon, "Reliable Fiducial Detection in Natural Scenes," presented at the European Conference on Computer Vision,2004.
    [99]Graz. (2006). ARToolKitPlus Home Page. Available: http://studierstube.icg.tu-graz.ac.at/handheld-ar/artoolkitplus.php
    [100]M. Rohs, "Marker-based embodied interaction for handheld augmented reality games," Journal of Virtual Reality and Broadcasting, vol.4, pp.0009-6,2007.
    [101]G. Schall, et al., "Handheld Augmented Reality for underground infrastructure visualization," Personal and Ubiquitous Computing, vol.13, pp.281-291,2009.
    [102]D. Wagner and D. Schmalstieg, "Artoolkitplus for pose tracking on mobile devices," presented at the Computer Vision Winter Workshop,2007.
    [103]E. Rosten and T. Drummond, "Machine learning for high-speed corner detection," presented at the ECCV,2006.
    [104]S. Taylor, et al., "Robust feature matching in 2.3us," presented at the Computer Vision and Pattern Recognition,2009.
    [105]P. Bouthemy, "A maximum likelihood framework for determining moving edges," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.11, pp.499-511, 1989.
    [106]J. Brough, et al., "Towards the development of a virtual environment-based training system for mechanical assembly operations," Virtual Reality, vol.11, pp.189-206, 2007.
    [107]朱洪敏,et al.,“分布式虚拟装配环境中碰撞检测方法研究,”系统仿真学报,pp.7154-7159,2009.
    [108]郑轶,et al.,“三通道投影虚拟装配环境的研究与实现,”计算机辅助设计与图形学学报，vol.18,pp.314-318,2006.
    [109]林惊,et al.,“移动增强现实系统的关键技术研究,”中国图象图形学报，pp.560-564,2009.
    [110]A. Liverani, et al., "A CAD-Augmented Reality Integrated Environment for Assembly Sequence Check and Interactive Validation," Concurrent Engineering:Research and Applications, vol.12, pp.67-77,2004.
    [111]S. Ong, et al., "Augmented reality aided assembly design and planning," CIRP Annals-Manufacturing Technology, vol.56, pp.49-52,2007.
    [112]刘振宇and谭建荣,”面向虚拟装配的产品层次信息表达研究,”计算机辅助设计与图形学学报,vol.13,pp.223-228,2001.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700