一种面向人脸的柔性目标理解与运动分析技术的研究

作者：侯云舒
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：图像解释 ; 运动分析 ; 统计模型 ; ASM/AAM ; ICA算法 ; 光流估计 ; 不确定性分解 ; 子空间约束
英文关键词：image interpretation ; motion analysis ; statistical model ; ASM/AAM ; ICA algorithm ; optical flow estimation ; uncertainty factorization ; subspace constraint
学位年度：2003
导师：赵荣椿
学科代码：081203
学位授予单位：西北工业大学
论文提交日期：2003-02-20

摘要

静态图像的分割解释和序列图像的运动分析是计算机视觉中两个基本的问题，已有大量的研究工作者对这些领域进行了深入的研究并提出了大量行之有效的方法，并业已在人们的生产生活中显示出越来越大的作用。
     本文在静态图像的分割解释和序列图像的运动分析这两方面都做了大量富有成效的研究，作者成功地建立了一个面向人脸图像的全自动目标提取解释与运动分析系统，所做的工作从功能上主要包括三个方面：人脸目标的解释，人脸区域的运动分析和人脸面部特征点的跟踪。
     在静态图像的分割解释方面：作者仔细的研究分析了当今流行的基于统计模型的方法，并选取ASM／AAM作为基本的出发点，并将这一技术进行了大量的分析、推广和综合，做了各种改进，并将之成功的运用到人脸图像的分割和解释上，并用实验证明了其高效性和鲁棒性。
     在序列图像的区域运动分析方面：作者以基本的L-K算法为出发点，推导了支持任意变换的快速的逆成分算法，并将之成功的运用到人脸序列图像的运动分析，该算法可以实时的完成对目标的跟踪，并对目标的任意仿射变换都能高效的支持。
     在序列图像的点运动分析方面：作者以经典的光流方程为出发点，并将不确定性分解理论和子空间光流理论有机的融合在一起，并引入了区域运动分析中的逆成分算法思想，得到了高效的点运动估计，实验证明该算法能有效地跟踪到具有2D和1D甚至基本没有纹理的具有退化结构的目标点(极端情况只要全部点都不全在一个方向上退化即可)。
     作者通过有机的融合上面三种算法，实现了一套全自动的目标提取与运动分析系统，这套系统已成功的运用在人脸视频中，它能自动的从视频中提取人脸区域，并得到人脸的二维解释，实时的跟踪视频中的人脸，并给出半稠密的点对应。半稠密的点对应这一步有效的解决了SFM问题中得一个关键难题correspondence，这为下一步对人脸作自动的三维分析提供了坚实基础；而人脸的二维解释和运动分析这一步则能在基于对象的视频压缩等领域得到更积极的应用。
     从数学上，作者对统计模型的研究、子空间理论的运用和矩阵协方差加权技术等在计算机视觉的应用都进行了深入的研究，作出了一定的贡献。值得注意的是，本文的算法不仅对人脸有效，而是对一类刚性／柔性目标均能高效的进行目标的运动分析与解释，如手，汽车等。
Image interpretation, video motion estimation and analysis is the key problems of computer vision and have attracted intense interest of many researchers. Till now a great deal of algorithms have been proposed and show potential good result in many applications throughout the people's woking and living.
    This thesis is focus on image interpretation and motion analysis. A system of full automatic face interpretation and motion analysis is set up, which can be devided into three sub fields: face interpretation, face region motion estimation and face feature points tracking.
    In face image interpretation we select ASM/AAM as the basis, make many good generalization and successfully apply it to the face image, which the result shows the efficiency and robustness of the algorithm. In motion estimation and analysis we make a full analysis of L-K algorithm and generilize it to the inverse compositional algorithm, which can track the moving object real time and support any image worping. For face feature points tracking we introduce the uncertainty factorization, subspace optical flow estimation and fuse the idea of the inverse compositional algorithm. As a result an efficient and robust algorithm is presented in the thesis. The proposed algorithm has been proven by experiments that it can properly track points with the degeneration textures, which have only ID or even little texture. Thus it provides a unified approach for tracking corner-like points together with points along linear structures in the image. It also provides semi-dense correspondence, which is one of the key problems of SFM. The image interpretation and motion estimation result can be also potentially used in object-based video coding.
    In the thesis we have an insight view of statistical models, covariance weighted and subspace constraint optical flow and successfully apply these technologies in the system of automatic face interpretation and motion analysis. It is also worth noting that this technology not only can be used for face but also for a class of rigid/nomigid objects, such as hand, car and so on.

引文

[1] V. Blanz and T.Vetter. "A morphable model for the synthesis of 3d faces". In SIGGRAPH'99 Conference Proceedings, pages 187-194, 1999．
    [2] T. F. Cootes, G. J. Edwards, and C. J. Taylor. Active appearance models. In ECCV98, volume 2,pages 484-498, 1998．
    [3] T. F. Cootes, K. N. Walker, and C. J. Taylor. View-based active appearance models. In Proc. Int. Conf. on Face and Gesture Recognition, pages 227-232, 2000．
    [4] T. F. Cootes, G. V. Wheeler, K. N. Walker, and C. J. Taylor. Coupled-view active appearance models. In Proc. British Machine Vision Conference, volume 1, pages 52-61, 2000．
    [5] G. Edwards, T. Cootes, and C. Taylor. "Face recognition using active appearance models". In Proceedings of the European Conference on Computer Vision, volume 2, pages 581-695, 1998．
    [6] S. Z. Li, S. C. Yan, H. J. Zhang, and Q. S. Cheng. "Multiview face alignment using direct appearance models". In preparation, 2001．
    [7] M. A. Turk and A. P. Pentland. "Face recognition using eigenfaces.". In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 586-591, Hawaii, June 1991．
    [8] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham. Active Shape Models-their training and application. Computer Vision and Image Understanding, 61(1) :38-59, Jan. 1995．
    [9] T.Cootes and C.Taylor. Constrained Active Appearance Models. In Proceedings of the 8th ICCV, July, 2001．
    [10] A.Yuille, P.Hallinan, and D.Cohen. Feature Extraction from Faces using Deformable Templates. InternationalJournal of Computer Vision, vol. 8, no. 2, pp. 99-111, 1992．
    [11] M. J. Black and A. D. Jepson. EigenTracking: Robust matching and tracking og articulated objects using a view-based representation. In Proceedings of the European Conference on Computer Vision, pages 329-342． Springler-Verlag, 1996．
    [12] Z. Zhang, R. Deriche, O. Faugeras, and Q.-T. Luong. A robust technique for matching two uncalibrated images through the recovery of the unknown epipolar geometry. Arti.cial Intelligence Journal, 78:87-119, October 1995．
    [13] M. Irani and P. Anandan. Factorization with uncertainty. In Proc. ECCV, 2000．
    [14] M. Irani. Multi-frame optical flow estimation using subspace constraints. In Proc. ICCV, 1999．
    [15] M. Brand. Flexible flow for 3D nonrigid tracking and shape recovery. In Proc. CVPR, 2001．
    [16] C. Bregler, A. Hertzmann, and H. Biermann. Recovering nonrigid 3D shape from image streams. In Proc. CVPR, 2000．
    [17] C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method. InternationalJournal of Computer Vision, 9(2) :137-154, 1992．
    [18] V. Blanz and T. Vetter. A morphable model for the synthesis of 3D faces. In Proc. SIGGRAPH99, 1999．


    [19] S. Baker and T. Kanade. Limits on super-resolution and how to break them. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2000．
    [20] S. Baker and I. Matthews. Equivalence and efficiency of image alignment algorithms. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2001．
    [21] J.R. Bergen, P. Anandan, K.J. Hanna, and R. Hingorani. Hierarchical model-based motion estimation. In Proceedings of the European Conference on Computer Vision, 1992．
    [22] M. Black and A. Jepson. Eigen-tracking: Robust matching and tracking of articulated objects using a view-based representation. InternationalJournal of Computer Vision, 36(2) :101-130, 1998．
    [23] M. La Cascia, S. Sclaroff, and V. Athitsos. Fast, reliable head tracking under varying illumination:An approach based on registration of texture-mapped 3D models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(6) :322-336, 2000．
    [24] F. Dellaert and R. Collins. Fast image-based tracking by selective pixel integration. In Proceedingsof the ICCV Workshop on Frame-Rate Vision, 1999．
    [25] P.E. Gill, W. Murray, and M.H. Wright. Practical Optimization. Academic Press, 1986．
    [26] M. Gleicher. Projective registration with difference decomposition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1997．
    [27] G.D. Hager and P.N. Belhumeur. Efficient region tracking with parametric models of geometryand illumination. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(10) :1025-1039, 1998．
    [28] B. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. In Proceedings of the International Joint Conference on Artificial Intelligence, 1981．
    [29] W.H. Press, B.P. Flannery, S.A. Teukolsky, andW.T. Vetterling. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, second edition, 1992．
    [30] S. Sclaroff and J. Isidoro. Active blobs. In Proceedings of the 6th IEEE International Conferenceon Computer Vision, 1998．
    [31] H.-Y. Shum and R. Szeliski. Construction of panoramic image mosaics with global and local alignment. InternationalJournal of Computer Vision, 16(1) :63-84, 2000．
    [32] R. Szeliski and P. Golland. Stereo matching with transparency and matting. In Proceedings of the 6th IEEE International Conference on Computer Vision, 1998．
    [33] C. Bregler, A. Hertzmann, and H. Biermann. Recovering Non-Rigid 3D Shapefrom Image Streams. In CVPR, 2000．
    [34] D. DeCarlo and D. Metaxas. Deformable model-based shape and motion analysisfrom images using motion residual error. In ICCV, 1998．
    [35] F. Pighin, D. H. Salesin, and R. Szeliski. Resynthesizing facial animation through3d model-based tracking. In ICCV, 1999．
    [36] B.D. Ripley. Stochastic Simulation. Wiley. New York, 1987．
    [37] C. Tomasi and T. Kanade. Shape and motion from image streams under orthography:a

    factorization method. Int. J. of Computer Vision, 9(2): 137-154, 1992.
    [38] L. Torresani, D.B. Yang, E.J. Alexander, and C. Bregler. Tracking and ModelingNon-Rigid Objects withRa nk Constraints. In CVPR, 2001.
    [39] O. Faugeras, "What Can be Seen in Three Dimensions with an Uncalibrated Stereo Rig," in Proc. European Conf Computer Vision, pp. 563-578, 1992.
    [40] O. Faugeras, Three-Dimensional Computer Vision A Geometric Viewpoint, Cambridge, MA: MIT Press, 1993.
    [41] R.I. Hartley, "In Defense of the Eight-point Algorithm," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 6, pp. 580-593, 1997.
    [42] M. Pollefeys, R. Koch and L. V. Gool, "Self-Calibration and Metric Reconstruction in Spite of Varying and Unknown Internal Camera Parameters," in Proc. Int. Conf Computer Vision, pp. 90-95, 1998.
    [43] P. Sturm and B. Triggs, "A Factorization Based Algorithm for Multi-Image Projective Structure and Motion," in Proc. European Conference on Computer Vision, pp. 709-720, 1996.
    [44] R. Zhang, P.S.Tsai, J.E.Cryer & M.Shah, Shape from Shading: A Survey, IEEE Trans. on Pattern Analysis and Machine Intelligence(PAMI), 21(8), 1999.8, pp 690—705
    [45] D. Nandy & J. B. Arie, Shape from Recognition and Learning: Recovery of 3-D FaceShape, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1999(1), pp 2-7
    [46] A. Murat Tekalp. Digital Video Processing. 1995
    [47] 马颂德，张正友．计算机视觉——计算理论与算法基础．科学出版社，1998
    [48] 赵荣椿．数字图像处理导论．西北工业大学出版社，1999
    [49] 郑南宁．计算机视觉与模式识别．国防工业出版社，1998
    [50] 边祺，张学工．模式识别(第二版) 清华大学出版社，2000

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700