基于人脸特征定位和建模理论的视频编码关键技术研究

英文题名：Research on Key Knowledge of Video Coding Based on Facial Feature Localization and Face Modeling Theory
作者：范小九
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：关键词 ; 会话视频编码 ; 码率控制 ; 比特分配 ; 率失真优化 ; 编码依赖性 ; 差错隐藏 ; 感兴趣区域 ; 人脸检测 ; 特征提取 ; 人脸建模
英文关键词：Conversational Video Coding ; Rate Control ; Bit Allocation ; Rate-Distortion
英文关键词：Optimization ; Coding Dependency ; Error Concealment ; Region of Interest (ROI) ; Human
英文关键词：face Detection ; Facial Feature Extraction ; Human Face Modeling
学位年度：2012
导师：彭强
学科代码：081203
学位授予单位：西南交通大学
论文提交日期：2011-10-01

摘要

人脸作为人类区别于其他生物的关键特征之一,在人际交往及社会活动中扮演着主要信息载体的角色,因而对其进行全面而深入的研究具有十分重要的理论和现实意义。随着实时多媒体服务的兴起,视频会议、可视电话、新闻播报等应用都与人脸有着直接或间接的联系。伴随这些应用的广泛推广,人脸研究的重要性更是与日俱增。在视频编码及通信界,通常会用“会话视频序列”来对上述应用加以概括。本文即以会话视频序列为研究主体,结合人脸检测、特征定位及模型构建理论开展相应的视频压缩方法和技术路线研究。
     在经典的视频压缩理论中,所有的帧图像及编码单元都基于同等重要性而被顺序编码。随着研究的深入,人们逐渐意识到视频编码算法的评价指标除了压缩率和峰值信噪比(Peak Signal to Noise Ratio, PSNR)之外,还应考虑“感兴趣区域(Region of Interest,ROI)”的编码质量。事实上,使用者往往以对ROI压缩效果的主观感受的好坏来直接评价视频编码结果的可接受程度。因此,如何保证或提高会话视频序列中人脸ROI的编解码质量是当前会话视频编码领域中亟待研究的前沿课题。
     本质上,网络带宽、计算能力等编码资源的限制和有效信息在传输过程中的丢失是制约视频编码图像质量的主要因素,其在低带宽、高误码率应用的实时会话视频编码中的影响尤为突出。因此,本论文探讨了两种对人脸ROI予以侧重的编码策略和一种解码端的差错掩盖方法,以实现在给定的信道条件下达到最佳的人脸ROI的主客观视频质量。
     首先,论文提出了一种用于人脸区域及其特征保持的比特分配及资源优化方案。方案考虑了三个方面的预处理工作。第一,为实现人脸ROI的快速提取,利用人脸区域在会话视频序列中丰富的运动特征,精简了传统Adaboost人脸检测算法中庞大的金字塔式候选图像子集。第二,为保证所提取人脸ROI的准确性,结合肤色特征完成了人脸ROI的辅助确认。第三,为获取人脸轮廓及其他面部特征的宏块(Macro Block,MB)位置,对Snake算法和主动轮廓模型(Active Shape Model, ASM)的搜索范围、收敛方向及能量平衡态判决条件等算法参数的选择方法进行了优化。在参考人脸结构特性为各编码MB赋予特定比特分配优先级的基础之上,方案设计了相对精确的MB级绝对差均值(Mean Absolute Difference, MAD)自适应预测模型和量化参数(Quantization Parameter, QP)更新算法,从而完成了有侧重的比特分配。方案还根据对MB编码模式和其他编码条件的深入分析完成了进一步的资源优化。模拟实验表明,本方案实现了人脸ROI的快速提取及相关特征的较准确检测,优化了编码比特和其他资源的分配方式,较好的保证了人脸ROI及其特征位置的编码质量。与JM9.8中传统比特分配算法及相关参考文献中比特分配算法的实验结果对比显示,在相同编码比特率情况下,本方案人脸ROI的PSNR获得了提高。同时,比特分配与编码资源的优化配置相结合缩小了本方案编码器的帧级目标比特与实际比特的误匹配差距及总体编码耗时。另外,主观测试也进一步验证了本方案能提供视觉效果更好的视频重建质量。
     其次,论文介绍了视频编码中的全局率失真优化(Rate Distortion Optimization,RDO)思想及其传统解决方法,讨论了编码过程中考虑编码依赖关系的重要性。在将会话视频序列编码依赖性简化为人脸ROI时域依赖性的基础上,提出了一种由人脸ROI的综合优化和非人脸ROI的独立优化相结合的全局RDO框架。该框架能较好适用于常规One-pass编码结构,其中独立优化部分仍遵循传统的RDO优化规则,而综合优化部分则需考虑人脸ROI失真度对未来帧的时域扩散影响,且两部分通过新的拉格朗日系数相关联。为了统计综合优化中人脸ROI所造成的总失真度,本框架提出了一种基于前向运动搜索的人脸ROI时域扩散替代链的构造方法。结合人脸ROI时域扩散链,给出了一种人脸ROI的失真度时域扩散统计模型,其中基于变换残差的拉普拉斯分布特性构造的特征函数通过从运动补偿预测失真估计量化失真,实现了计算复杂度的降低。模拟实验表明,人脸ROI时域扩散替代链构造方法快速、合理,人脸ROI失真度时域扩散统计模型能够较好的估计失真度扩散情况,该框架为会话视频序列人脸ROI的全局RDO提供了一种有效的实施办法。与JM15.1中基于独立假设的RDO方法及相关参考文献中另一种考虑编码依赖性的RDO-Q方法的实验结果对比显示,本框架实现了视频序列整体及人脸ROI在PSNR差值(Bjontegaard1Delta PSNR, BDPSNR)上的同步提高或编码比特率差值(Bjontegaard Delta Bit Rate, BDBR)的下降。
     最后,论文研究了会话视频序列的差错掩盖方法,提出了一种基于人脸真实感模型辅助的空域差错掩盖策略。该策略主要包含三个方面的内容。第一,基于主动外观模型(Active Appearance Model, AAM)定位算法效率的高低与AAM初始拟合位置(初始中心、放置方位)和拟合实例(形状实例、表观实例)关系的密切性,设计出人脸关键特征粗定位方法以计算平面偏转角及侧深度偏转角,进而得出AAM模型的初始中心、放置方位和形状实例,同时结合纹理的相似特性确定AAM模型的表观实例,最终给出了一种基于AAM人脸关键特征点提取算法的改进策略。第二,利用得到的AAM人脸关键特征点和Candide-3人脸通用线框模型设计相应的人脸模型姿态调整、形状匹配及纹理映射算法,实现了一种快速的人脸真实感建模方法。第三,根据受损帧预掩盖结果和可供利用的人脸真实感模型,确定各受损MB所属类型划分,从而自适应调用各种空域掩盖算法。特别的,对于人脸ROI纹理块,本策略提出了一种从人脸模型平面映射图中搜索最佳替代块的掩盖思想。模拟实验表明,本策略中AAM改进算法的准确性高于原AAM,且人脸模型构建方法方便快捷,真实感强,为从单张二维图像恢复人脸深度信息的病态问题提供了较合理的解决方案。与基于JM17.0的空域双线插值算法和自适应方向插值算法的实验结果对比显示,基于人脸模型辅助的空域掩盖方法无论在交织打包和棋盘打包情况下,均可实现对受损块的较满意掩盖,提高了人脸ROI的主客观质量,一定程度上解决了人脸ROI丢失尤其足部分特征丢失时的恢复问题。
As an essential feature to distinguish human from other animals, human face plays the role of main information carrier in interpersonal communication and social activities. For this reason, studies on human face are of great theoretical and practical significance. Particularly, the importance of human face research is sharply growing with the development of real-time multimedia services, such as video conferencing, picture phone, news broadcasting system, etc., which are all directly or indierectly related to human face. Normally, aforementioned applications are generalized as "conversational video sequence" in video coding and communication area. In this paper, the video compression methodology and technology of conversational video sequence will be researched integrating with face detection, facial feature extraction, face modeling and so on.
     In classic video coding theory, every part of the pictures is sequentially compressed with equal importance. Originally, compression ratio and the peak signal to noise ratio (PSNR) are taken as two basic evaluation indexes to measure a video coding algorithm. As research progressed, more and more people realized the special meaning of the region of interest (ROI). In fact, users always tend to assess the acceptability of a video coding output by observing the quality of ROIs subjectively.Thus, how to guarantee the quality of human face ROI is a frontier subject in conversational video coding.
     The resource limitations of internet bandwidth and computational power as well as the information loss in transmission are three chief factors that restrict video quality in receiving end, especially in conversational video coding with low-bandwidth and high bit-error-rates. In this thesis, two error-resilient strategies and one error concealment approach were investigated in order to achieve best coding quality in human face ROI under a bit-rate constrained channel.
     Firstly, the thesis proposed a bit allocation and resource optimization scheme to protect human face ROI and its features. The scheme consists of three pretreatment. To efficiently extract human face ROI, we considered a motion-based sub-image rejection for pyramid searching structure in Adaboost face detection method. To guarantee the accuracy of the extracted human face ROI, verification was made with the aid of facial color statistics. To refine the actual human face contour and other facial feature locations, we optimized the parameter selection of search range and convergence direction as well as the energe equilibrium condition in Snake algorithm and Active Shape Model (ASM). On the basis of assigning priority for each macro block (MB) after considering facial geometry, the scheme designed a relatively precise mean absolute difference (MAD) adaptive prediction model and QP updating rule to achieve the final bit allocation strategy. Besides, the scheme made the coding resource optimization better through thorough analysis of MB mode and other coding options. Simulation results demonstrated that, the scheme can give reasonable human face ROI and facial feature locations for each frame as well as optimal bit-rates and resources for each MB. Hence, the coding quality of human face ROI and its features were well kept. Comparison with the basic bit allocation algorithm in JM9.8and other bit allocation methods showed that the PSNR of human face ROI in our scheme was improved significantly. Meanwhile, the gap between target bit and actual bit of each frame as well as total coding time were reduced in view of the optimization on coding resource. In addition, the subjective assessment further confirmed that our proposed scheme can provide much better video reconstruction quality.
     Secondly, the thesis introduced the global rate distortion optimization (RDO) problem with its traditional solution and discussed the importance of coding dependencies in encoding process. By simply taking the temporal dependency as the only coding dependency in the conversational video coding, we proposed a novel global RDO framework, which is made up by comprehensive optimization of human face ROI and individual optimization of non-face ROI. Thisframework workswell in common one-pass structure, when the part of comprehensive optimization takes the influence of temporal error propagation of human face ROI into account while the individual optimization still follows the tradition rule of RDO but shares the conjunct Lagrange multiplier with the former. To obtain the total distortion of a certain human face ROI in comprehensive optimization, we constructed a human face ROI temporal propagation alternative chain based on forward motion search. With the ROI temporal propagation chain, a source distortion temporal propagation model for human face ROI was subsequently developed, in which the characteristic function based on the Laplace distribution of transformed residuals using motion compensation errors to estimate quantization errors efficiently eliminated the computational complexity. Simulation results demonstrate that, the constructed human face ROI temporal propagation chain is efficient and reasonable, the proposed source distortion temporal propagation model for human face ROI has a good performance in estimating the propagation of error, and the framework provides an effective way for RDO of human face ROI in the conversational video coding. Comparing with the independent RDO method in JM15.1and another dependent RDO (RDO-Q) method, the proposed framework can achieve obvious BDPSNR (Bjontegaard Delta PSNR) gain and BDBR (Bjontegaard Delta Bit Rate) saving for human face ROI and the entire sequence simutaniously.
     Thirdly, the thesis studied the error concealment method for conversation video coding and proposed a human face realistic model aided spatial error concealment strategy, which iscomprised of three basic parts. First, according to the fact that the efficiency of active appearance model (AAM) is closely associated with initial fitting position (fitting center, fitting orientation) and fitting instance (shape instance, appearance instance), we developed a coarse-grained face feature point localization method to calculate the plane deflection angle and profile deflection angle, then the fitting centre, fitting orientation and shape instance are determined. After that, the appearance instance was selected by ultilizing texture similarity. Based on tuning the initial fitting parameters, the AAM was improved and the final facial feature points were ensured to be more precise.Second, we designed a pose adjustment, shape match and texture mapping method for constructing realistic human face model by combining the obtained AAM facial feature points and Candide-3generic wire-frame face model. At last, the category of the damaged MB was determined in terms of pre-concealment result and available realistic face model, then kinds of spatial error concealment methods could be adaptively selected. Particularly, we provided a solution to search the optimal replacement of damaged MB from plane mapping result of face model for face ROI texture MB. Simulation results demonstrated that, the improved AAM algorithm is superior to the original AAM on facial feature point extraction and the reconstructed human face is more realistic. The human face model constructing method provides a reasonable solution for the recovery of depth information from single2D image. Comparing with two spatial error concealment methods implemented on JM17.1, i.e., bilinear interpolation method and adaptive directional interpolation method, the proposed method can provide excellent error conealment results to damaged frames especially for destroyed ROI areas, whether in interleave packing style or in dispersed packing style. To some extent, we can say that the proposed spatial error concealment method solves the face and facial feature recovery problem in conversational video coding.

引文

[1]B. D. Perry, D. Czyzewski, M. Lopez, L. Spiller, and D. Treadwell-Deering. "Neuropsychologic impact of facial deformities in children-neurodevelopmental role of the face in communication and bonding", Clinics in Plastic Surgery, vol.25, pp. 587-597,1998.
    [2]A. R. Damasio, "Descarte's error:emotion, reason, and the human brain", New York: Gosset/Putnam Press,1994.
    [3]A. Mehrabian, "Communication without words", Psychology Today, vol.2, no.4, pp. 53-56,1968.
    [4]B. Reeves and C. Nass, "The media equation", Cambridge Univ. Press, Center for the Study of Language and Information,1996.
    [5]D. Kim and J. Sung, "Automated face analysis:emerging technologies and research", Medical Information Science Reference (an imprint of IGI Global), Hershey, New York,2009.
    [6]T. Kanade, "Picture processing by computer complex and recognition of human faces", Ph.D. thesis, Kyoto University,1973.
    [7]A. K. Jain, L. Hong, and S. Pankanti, "Biometrics:promising frontiers for emerging identification market," Communications of ACM, vol.43, no.2, pp.91-98, Feb.2000.
    [8]International Biometric Group, http://www.biometricgroup.com/,2011-05-24.
    [9]A. Auchterlonie, "Conversational Video over IP", BT Technology Journal, vol.22, no. 2, pp.78-85,2004.
    [10]B. A. Wandell, "Foundations of Vision", Sinauer Associates, Sunderland, MA 01375, 1995.
    [11]Joint Video Team of ITU-T and ISO/IEC JTC 1, "Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification (ITU-T Rec. H.264 [ISO/IEC 14496-10 AVC)," Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, JVT-G050, Geneva, Switzerland, May 2003.
    [12]T. Wiegand, G. J. Sullivan, G. Bjontegarrd and A. Luthra, "Overview of the H.264/AVC video coding standard", IEEE Trans. Circuits, Syst., Video Technol., vol. 13, no.7, July 2003.
    [13]G. J. Sullivan and J.-R. Ohm. "Meeting report of the first meeting of the Joint Collaborative Team on Video Coding". Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG111st Meeting: Dresden, DE,15-23, JCTVC-A200, April,2010.
    [14]F. P. Kelly, A.K. Maulloo and D.K.H. Tan, "Rate control in communication networks: shadow prices, proportional fairness and stability", Journal of the Operational Research Society, vol.49, pp.237-252,1998.
    [15]T. Wiegand, H. Schwarz, A. Joch, F. Kossentini and G. J. Sullivan, "Rate-constrained coder control and comparison of video coding standards", IEEE Trans. Circuits Syst. Video Technol., vol.13, no.7, pp.688-703, Jul.2003.
    [16]张健,吴晓冰,无线视频通信综述,中国数据网络通信,1：36-42,2000。
    [17]S. Wenger. H.264/AVC over IP. IEEE Trans. Circuits, Syst., Video Technol., vol.13, no.7, pp.645-656, July 2003.
    [18]S. Wenger, G. Knorr, J. Ott, et al, Error Resilience Support in H.263+, IEEE Trans. Circuits, Syst., Video Technol., vol.8, no.6, pp.867-877,1998.
    [19]S. Kumar, L. Y. Xu, M. K. Mandal and S. Panchanthan, "Error resiliency schemes in H.264/AVC standard", Elsevier J. of Visual Communication & Image Representation (Special issue on Emerging H.264/AVC Video Coding Standard), vol.17, no.2, pp. 425-450, April 2006.
    [20]Y. Wang, J. Ostermann and Y. Q. Zhang, "Video Processing and Communications, Pearson Education",2002.
    [21]G. J. Sullivan and T. Wiegand, "Rate-distortion optimization for video compression" IEEE Signal Processing Magazine, vol.15, no.11, pp.74-99, Nov.1998.
    [22]A. Ortega and K. Ramchandran, "Rate-distortion methods for image and video compression:An overview", IEEE Signal Processing Magazine, vol.15, no.11, pp. 23-50, Nov.1998.
    [23]彭强,低码率视频差恢复技术若干问题研究,西南交通大学博士学位论文,2004。
    [24]D. S. Taubman and M. W. Marcellin, "JPEG2000:Image Compression Fundamentals, Standards, and Practice". Kluwer Academic Publishers,2001.
    [25]Q. Chen, Z. Chen, X. Gu and C. Wang, "Attention-based adaptive intra refresh for error-prone video transmission", IEEE Communications Magazine, vol.45, no.1, pp. 52-60, Jan.2007.
    [26]M. H. Yang, D. Kriegman, and N. Ahuja, "Detecting Faces in Images:A Survey", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.24, no.1, pp. 34-58,2002.
    [27]R.-L. Hsu, M. A.-Mottaleb and A. K. Jain, "Face detection in color images", IEEE Trans. Pattern Analysis and Machine Intelligence, vol.24, no.5, pp.696-707, May 2002.
    [28]C. Zhang and Z. Y. Zhang, "A survey of recent advances in face detection", Microsoft Technical Report, MSR-TR-2010-66, June 2010.
    [29]毕厚杰,新一代视频编码压缩标准——H.264/AVC,北京：人民邮电出版社,2005。
    [30]杨承根,视频通信中的人脸模型基编码研究,浙江大学博士学位论文,2006.12。
    [31]P. Kakumanu, S. Makrogiannis and N. Bourbakis, "A survey of skin-color modeling and detection methods", Pattern Recognition, vol.40, pp.1106-1122,2007.
    [32]H. Rowley, S. Baluja, and H. Kanade, "Neural network-based face detection", IEEE Trans. Pattern Analysis and Machine Intelligence, vol.20, no.1, pp.23-38,1998.
    [33]H. Schneiderman and T. Kanade, "A statistical method for 3D object detection applied to faces and cars", CVPR, pp.746-751,2000.
    [34]E. Osuna, R. Freund, and F. Girosi, "Training support vector machines:an application to face detection", in Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 130-136,1997.
    [35]P. Viola, "Rapid object detection using a Boosted cascade of simple features", In Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp.511-518,2001.
    [36]J. C. Terrillon, M. N. Shirazi, H. Fukamachi and S. Akamtsu, "Comparative performance of different skin chrominance models and chrominance spaces for the automatic detection of human faces in color images", Proc. IEEE Int'1 Conf. Face and Gesture Recognition, pp.54-61,2000.
    [37]C. A. Poynton, A Technical Introduction to Digital Video, John Wiley and Sons,1996.
    [38]MPEG-7 Content Set from Heinrich Hertz Institute, http://www.darmstadt.gmd.de/mobile/hm/projects/MPEG7/Documents/N2466.html Oct.1998.
    [39]B. Menser and M. Brunig, "Locating human faces in color images with complex background", Intelligent Signal Processing and Comm. System, pp.533-536, Dec. 1999.
    [40]S. L. Phung, A. Bouzerdoum, and D. Chai, "Skin segmentation using color pixel classification:Analysis and comparison," IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no.1, pp.148-154, Jan.2005.
    [41]J. Yang, W. Lu and A. Waibel, "Skin-color modeling and adaption", Technical Report CMU-CS-97-146, School of Computer Science, Carnegie Mellon University,1997.
    [42]W. Zhao, R. Chellappa, P. J. Philips and A. Rosenfeld, "Face recognition:a literature survey", ACM Comput. Surveys, vol.85, no.4, pp.299-458,2003.
    [43]L. G. Valiant, "A theory of the learnable", Communications of the ACM, vol.27, no. 11, pp.1134-1142, Nov.1984.
    [44]R. E. Schapire, "The strength of weak learnability", Machine Learning, vol.5, no.2, pp.197-227,1990.
    [45]Y. Freund and R. E. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting", Journal of Computer and System Sciences, vol.55, no. 1,pp.119-139,1997.
    [46]赵楠,基于Adaboost算法的人脸检测,北京大学本科论文,2005.6。
    [47]S. Li, L. Zhu, Z. Zhang, A. Blake, H. Zhang, and H. Shum, "Statistical learning of multi-view face detection", In Proc. of ECCV, pp.67-81,2002.
    [48]C. A. Waring and X. Liu, "Face detection using spectral histograms and SVMs", IEEE Trans, on Systems, Man, and Cybernetics-Part B:Cybernetics, vol.35, no.3, pp.467-476,2005.
    [49]X. Wang, T. X. Han, and S. Yan, "An HOG-LBP human detector with partial occlusion handling", In Proc. of ICCV, pp.32-39,2009.
    [50]Intel Open Source Computer Vision Library (Version beta 3.1), Available: http://www.intel.com/technology/computing/opencv/, Oct.2009.
    [51]G. C. Feng and P. C. Yuen, "Variance projection function and its application to eye detection for human face recognition", Pattern Recognition, vol.19, pp.899-906, 1998.
    [52]H. Liu, S. Yan, X. Chen and W. Gao, "Rotated face detection in color images using radial template (RT)", Pore. Int'1 Conf. on Acoustics, Speech and Signal Processing, vol.3, pp.213-216,2003.
    [53]M. Kass, A. Witkin and D. Terzopoulos, "Snake:active contour models", Int'l Journal of Computer Vision, vol.1, pp.321-331,1988.
    [54]T. McInemey and D. Terzopoulos, "Deformable models in medical image analysis:a survey", Medical Image Analysis, vol.1, no.2, pp.91-108,1996.
    [55]P. C. Yuen, G. C. Feng and J. P. Zhou, "A contour detection method:initialization and contour model", Pattern Recognition Letters, vol.20, pp.141-148,1999.
    [56]李培华,张田文,主动轮廓模型(蛇模型)综述,软件学报，11(6)：751-757,2000.
    [57]C. Xu and J. L. Prince, "Snakes, shapes, and gradient vector How", IEEE Trans, on Image processing, vol.7, no.3, pp.359-369,1998.
    [58]J. Wang and X. Li, "Controlled accurate searches with balloons". Pattern Recognition, vol.36, no.3, pp.827-843,2003.
    [59]B. Xiong, W. Yu and C. Charoensak, "Face contour tracking in video using active contour model", International Conference on Image Processing, vol.2, pp.1021-1024, 2004.
    [60]A. A. Amini, T. E. Weymouth and R. R. Jain, "Using dynamic programming for solving variational problems in vision", IEEE trans, on Pattern Analysis and Machine Intelligence, vol.12, no.9, pp.855-867,1990.
    [61]T. F. Cootes and C. J. Tavlor, "Active shape models-their training and application", Computer Vision and Image Understanding, vol.61, no.1, pp.38-59,1995.
    [62]J. C. Gower, "Generalized procrustes analysis", Psychometrika, vol.40, pp.33-51, 1975.
    [63]I. T. Jolliffe, "Principle Component Analysis (2nd Edition)", New York:Springer-Verlag,2002.
    [64]W. Wang, S. Shan, W. Gao and B. Cao, "An improved active shape model for face alignment", Proc.4th Int. Conf. Multi-Modal Interface, Pittsburgh, PA, pp.523-528, Oct.2002.
    [65]N. Duta and M. Sonka, "Segmentation and interpretation of MR brain images:An improved active shape model", IEEE Trans. Med. Imag., vol.17, pp.1049-1067, June 1998.
    [66]M. H. Mahoor, M. A-Mottaleb and A.-N. Ansari, "Improved active shape model for facial feature in color images", Journal of Multimedia, vol.1, no.4, pp.21-28, Jul. 2006.
    [67]T. F. Cootes, G. J. Edwards and C. J. Taylor, "Active appearance models", Proc. European Conf. Computer Vision, vol.2, pp.484-498,1998.
    [68]I. Matthews and S. Baker, "Active appearance models revisited", Int'1 Journal of Computer Vision, vol.60, no.2, pp.135-164,2004.
    [69]M. B. Stegmann, "Active Appearance Models:Theory, Extensions and Cases", IMM (Informatics and Mathematical Modeling), Technical University of Denmark,2000.
    [70]J. Xiao, S. Baker, I. Matthews and T. Kanade, "Real-time combined 2d+3d active appearance models", Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, pp.535-542, Jun.2004.
    [71]B. D. Lucas and T. Kanade, "An iterative image registration technique with an application to stereo vision", Proc. Int'1 Joint Conf. on Artificial Intelligence, pp. 674-679,1981.
    [72]R. Gross, I. Matthews and S. Baker, "Lucas-kanade 20 years on:a unifying frame work:part 3", CMU-RI-TR-03-05, CMU,2003.
    [73]J. Sung and D. Kim, "Estimating 3D facial shape and motion from stereo image using active appearance models with stereo constraints", Proc. Int'1 Conf. Image Analysis and Recognition (ICIAR 2006), vol.2, pp.457-467,2006.
    [74]王磊,邹北骥,彭小宁,周凌,一种改进的提取人脸面部特征点的AAM拟合算法,电子学报,34(8)：1424-1427,2006。
    [75]R. Gross, I. Matthews and S. Baker, "Active appearance models with occlusion", Image and Vision Computing, vol.24, pp.593-604,2006.
    [76]F. I. Parke, "Computer generated animation of faces", Proc. of the ACM Annual Conf., Boston, Massachusetts, United States, pp.451-457,1972.
    [77]徐琳,袁保宗,高文,真实感人脸建模研究的进展与展望,软件学报,14(4)：804-81,2003。
    [78]郑颖,基于图像的三维人脸建模研究,中国科学技术大学博士学位论文,2009.5。
    [79]龚勋,基于单张二维图片的三维人脸建模,西南交通大学博士学位论文,2008.12。
    [80]D. DeCarlo, D. Metaxas and M. Stone, "An anthropometric face model using variational techniques", In Proc. of SIGGRAPH, Orlando, USA, pp.67-74,1998.
    [81]Z. Liu, Z. Zhang, C. Jacobs, et al., "Rapid modeling of animated faces from video" Technical Report MSR-TR-2000-1, Microsoft Research, Microsoft Corporation, WA, 2000.
    [82]J. Ahlberg, "CANDIDE 3-an updated parameterized face", Technical Report LiTH-ISY-R-2326, Linkoping University, Sweden,2001.
    [83]I. Kemelmacher-Shlizerman and R. Basri, "3D face reconstruction from a single image using a single reference face shape", IEEE Tran. Pattern Analysis and Machine Intelligence, vio.33, no.2, pp.394-405, Feb.2011.
    [84]A. P. Dempster, N. M. Laird and D. B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm", Journal of the Royal Statistical Society. Series B (Methodological), vol.39, no.1, pp.1-38,1977.
    [85]Y. Guan, "Automatic 3D face reconstruction based on single 2D image", Proc. of Int'1 Conf. on Multimedia and Ubiquitous Engineering, pp.1216-1219,2007.
    [86]杨天武,基于率失真优化的视频差错综合控制技术研究,西南交通大学博士学位论文,2006。
    [87]Q. Chen, Z. Chen, X. Gu and C. Wang, "Attention-based adaptive intra refresh for error-prone video transmission", IEEE Communications Magazine, vol.45, no.1, pp. 52-60, Jan.2007.
    [88]曾超,基于有损率失真优化的AVS-M帧内更新算法研究,西南交通大学硕士学位论文,2008。
    [89]C.-M. Chen, C.-W. Lin and Y.-C. Chen, "Error resilience transcoding using content-aware intra-refresh based on profit tracing," Proc. IEEE Int. Symp. Circuits and Systems, Island of Kos, Greece, pp.5283-5286, May 2006.
    [90]H. Song and C. C. Jay Kuo, "A region-based H.263+codec and its rate control for low VBR video", IEEE Trans. Multimedia, vol.6, no.3, pp.489-500, Jun.2004.
    [91]S. Aramvith, H. Kortrakulkij, D. Tancharoen and S. Jitapankul, "Joing source-channel coding using simplified block-based segmentation and content-based rate-control for wireless video transport," in Proc. Int. Conf. Information Technology:Coding and Computing (ITCC) 2002, Las Vegas, pp.71-76, Apr.2002.
    [92]C.-W. Lin, Y.-J. Chang and Y.-C. Chen, "A low-complexity face-assisted coding scheme for low-bit-rate video telephony," IEICE Trans. Inform. Syst., vol. E86-D, no. 1,pp.101-108, Jan.2003.
    [93]Y. Sun, I. Ahmad, D. Li and Y.-Q. Zhang, "Region-based rate control and bit allocation for wireless video transmission", IEEE Trans, on Multimedia, vol.18, no.1, pp.1-9, Feb.2006.
    [94]Y. Liu, Z. G. Li and Y. C. Soh, "Region-of-interest based resource allocation for conversational video communication of H.264/AVC", IEEE Trans. Circuits Syst. Video Technol., vol.18, no.1, pp.134-139, Jan.2008.
    [95]Y. Liu, Z. G. Li and Y. C. Soh, "A Novel Rate Control Scheme For Low Delay Video Communication of H.264/AVC Standards", IEEE Trans. Circuits, Syst., Video Technol., vol.17, no.1, pp.68-78,2007.
    [96]A. Jerbi, J. Wang and S. shirani, "Error-Resilient Region-of-Interest Video Coding", IEEE Trans. Circuits, Syst., Video Technol., vol.15, no.9, pp.1175-1181, Sep.2005.
    [97]Q. Liu, R.-M. Hu and Z. Han, "Adaptive background skipping algorithm for region-of-interest scalable video coding", Int'1 Conf. Communication Systems (ICCS), Singapore, pp.788-792,2008.
    [98]邓中亮等,基于H.264的视频编／解码与控制技术,北京邮电出版社,2010.7。
    [99]彭强,张蕾,Jim. X. Chen,视频传输差错掩盖技术研究现状与发展趋势,西南交通大学学报,44(4)：473-483,2009。
    [100]Y. L. Feng and S. Y. Yu, "Adaptive error concealment algorithm and its application to MPEG-2video communications", Proc. IEEE Int'1 Conf. Communication Technology (ICCT 98), Beijing, China, vol.1, pp.1-5,1998.
    [101]张荣福,周源华,视频通信中频域和空域相结合的错误掩盖方法,上海交通大学学报,38(4),2004：606-609.
    [102]T. P. Chen and T. Chen, "Second-generation error concealment for video transport over error prone channels", Wireless Communications and Mobile Computing (Special Issue on Multimedia over Mobile IP), vol.1, pp.25-28, Oct.2002.
    [103]G. M. Schuster, X. Li and A. K. Katsaggelos, "Shape error concealment using hermite splines", IEEE Trans. Image Processing, vol.13, no.6, pp.802-820, Jun.2004.
    [104]H. Cheng and P. Salama, "Error concealment for shape in MPEG-4 object-based video coding", IEEE Trans. Image Processing, vol.14, no.4, pp.389-396, Apr.2005.
    [105]H.-J. Ma and Y.-W. Chen, "Network-aware perceptual error concealment method for H.264 video with side information", Journal of Central South University of Technology, vol.17, no.4, pp.816-823,2010.
    [106]S. D. Lin, S. C. Shie, and J. W. Chen, "Image error concealment based on watermarking", Proc.7th Digital Image Computing: Techniques and Applications, Sydney, Australia, pp.137-143,2003.
    [107]J. Wang and J. Liang, "A region and data hiding based error concealment scheme for images", IEEE Trans. Consum. Electron, vol.47, pp.257-262,2001.
    [108]Z. Luo, L. Song, S. Zheng, Y. Xu and X. Yang, "Improved error concealment of region of interest based on the H.264/AVC standard", Optical Engineering, vol.49, no.4, pp.047003-1-10, Apr.2010.
    [109]杨志伟,基于人脸特征的视频差错掩盖算法研究,西南交通大学硕士学位论文,2009。
    [110]JVT, H.264/AVC Reference Software HHI, Available: http://iphome.hhi.de/suehring/tml/download/old jm/,2011-02-24.
    [111]Methodology for the Subjective Assessment of the Quality of Television Pictures, ITU-R BT.500-11,2002.
    [112]G. Bjontegaard, "Calculation of average PSNR differences between RD curves (VCEG-M33)", in ITU-T SG16 Q.6 (VCEG), Austin, TX, Apr.2001.
    [113]"MPEG-2 test model 5", Doc. ISO/IEC/JTC1/SC29 WG11/93-400, Apr.1993.
    [114]J. Ribas-Corbera and S. Lei, "Rate control in DCT video coding for low-delay communications", IEEE Trans. Circuit Syst. Video Technol., vol.9, pp.172-185, 1999.
    [115]H. J. Lee, T. H. Chiang and Y. Q. Zhang, "Scalable Rate Control for MPEG-4 Video" IEEE Trans. Circuit Syst. Video Technol., vol.10, pp.878-894,2000.
    [116]Z. G. Li, F. Pan, K. P. Lim, G. N. Feng, X. Lin and R. Susanto, "Adaptive basic unit layer rate control for JVT", in 7th meeting:Pattaya,2003, pp.7-15, JVT-G012.
    [117]C. Grecos and M. Y. Yang, "Fast inter mode prediction for p slices in the H.264 video coding standard", IEEE Trans. Broadcasting, vol.51, no.2, Jun.2005.
    [118]K. P. Lim, S. Wu, D. J. Wu, S. Rahardja, X. Lin, F. Pan and Z. G. Li, "Fast inter mode selection", JVT-I020, San Diego, USA, Sep.2003.
    [119]Y.-H. Kim, J.-W. Yoo, S.-W. Lee, J. Shin, J. Paik and H.-K. Jung, "Apdative mode decision for H.264 encoder", Electronics Letters, vol.40, no.19, pp.1172-1173, Sep. 2004.
    [120]Q. Dai, D. Zhu and R. Ding, "Fast mode decision for inter prediction in H.264", Proc. IEEE Int'1 Conf. Image Processing, vol.1, pp.119-122, Sep.2004.
    [121]X. Lu, A. Tourapis, P. Yin and J. Boyce, "Fast mode decision and motion estimation for H.264 with a foucs on MPEG-2/H.264 transcoding", Proc. IEEE Int'1 Sympo. Circuits and Systems, Kobe, Japan, vol.2, pp.1246-1249, May 2005.
    [122]N. Doulamis, A. Doulamis, D. Kalogeras and S. Kollias, "Low bit-rate coding of image sequences using adaptive regions of interest", IEEE Trans. Circuits, Syst., Video Technol., vol.8, no.8, pp.928-934, Dec.1998.
    [123]Z. Bojkovic and D. Milovanovit, "Multimedia coding using adaptive regions of interest", Proc.7th Seminar on Neural Networks Applications in Electrical Engineering, Serbia and Montenegro, pp.67-71, Sep.2004.
    [124]M.-C. Chi, M.-J. Chen and C.-T. Hsu, "Region-of-interst video coding by fuzzy control for H.263+ standards", Proc. IEEE Int'1 Sympo. Circuits and Systems, vo.2, pp.93-96, May 2004.
    [125]P. Viola and M. Jones, "Robust Real-Time Face Detection", International Journal of Computer Vision, vol.57, no.2, pp.137-154,2004.
    [126]R. T. Collins, "A system for video surveillance and monitoring", Technical Report CMU-RI-TR-00-12,2000.
    [127]W.-H Lai, C.-T. Li, "Skin colour-based face detection in colour Images", IEEE Int'1 Conf. on Video and Signal Based Surveillance, Sydney, Australia, pp.56-62, Nov. 2006.
    [128]H. J. Lee, T. H. Chiang and Y. Q. Zhang, "Scalable rate control for MPEG-4 video", IEEE Trans. Circuits, Syst., Video Technol., vol.10, no.9, pp.878-894, Sep.2000.
    [129]Elecard StreamEye Application, Version 2,9,2 (build 70607), www.elecard.com.
    [130]J. Zhang, X. Yi, N. Ling and W. Shang, "Context adaptive Lagrange multiplier (CALM) for motion estimation in JM-improvement (JVT-T046)", in Joint Video Team (JVT) of ISO/IEC MPEG ITU-T VCEG, Klagenfurt, Austria, Jul.2006.
    [131]T. Wiegand and B. Girod, "Lagrange multiplier selection in hybrid video coder control", in Proc. IEEE Int. Conf. Image Processing, Thessaloniki, Greece, vol.3, pp. 542-545,2001.
    [132]K. Takagi, "Lagrange multiplier and rd-characteristics (JVT-C084)", in Joint Video Team (JVT) of ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6, Virginia, USA, May.2002.
    [133]M. karczewicz, P. Chen, Y. Ye and R. Joshi, "R-D based quantization in H.264", in Proc. of SPIE 2009, vol.7443, pp.744314-1-744314-8.
    [134]E. H. Yang and X. Yu, "Rate distortion optimization for H.264 interframe coding:A general framework and algorithms", IEEE Transaction on Image Processing, vol.16, no.7, pp.1774-1784, July.2007.
    [135]Y. Yang and S. S. Hemami, "Generalized Rate-Distortion Optimization for Motion-Compensated Video Coders", IEEE Transactions on Circuits and System for Video Technology, vol.10, no.6, pp.942-955, Sep.2000.
    [136]C. An and T. Q. Nguyen, "Iterative Rate-Distortion Optimization of H.264 With Constant Bit-rate Constraint", IEEE Transaction on Image Processing, vol.17, no.9, pp.1605-1615, Sep.2008.
    [137]B. Schumitsch, H. Schwarz and T. Wiegand, "Optimization of transform coefficient selection and motion vector estimation considering interpicture dependencies in hybrid video coding", in Proc.SPIE,5685, pp.327-334,2005.
    [138]S. Boyd, L. Xiao and A. Mutapcic, "Notes on decomposition methods", Notes for EE392o, Stanford University, Oct.2003.
    [139]A. J. Viterbi and J. K. Omura, "Principles of digitial communication and coding", New York:McGraw-Hill Electrical Engineering Series,1979.
    [140]E. Lam and J. Goodman, "A mathematic analysis of the DCT coefficient distribution for images", IEEE Trans. Image Process., vol.9, no.10, pp.1661-1666, Oct.2000.
    [141]G. Sullivan, "Recommended simulation common conditions for H.26L coding efficiency experiments on low-resolution progressive-scan source material (VCEG-N81)", in ITU-T SG16 Q.6 (VCEG), Santa Barbara, USA, Sep.2001.
    [142]M. Karczewicz, Y. Ye and I. Chong, "Rate distortion optimized quantization", ITU-T Q.6/SG 16 VCEG, VCEG-AH21, Antalya, Turkey, Jan.2008.
    [143]G. J. Sullivan and J-R Ohms, "Recent developments in the standardization of high efficiency video coding (HEVC)", Proc. SPIE, vol.7790, pp.77980V-1-77980V-7, Aug.2010.
    [144]P. Merkle, K Muller, T. Wiegand, "3D Video:Acquisition, Coding, and Display", IEEE Transactions on Consumer Electronics, vol.56, no.2, pp.946-950, May 2010.
    [145]Goldennumer.Net, "The human face is based entirely on Phi", Avaliable: http://www.goldennumber.net/face.htm
    [146]C. Whaley, A. Petitet, J. Dongarra, "Automated empirical optimization of software and the ATLAS project", Parallel Computing, vol.27, no.2, pp.3-35,2001.
    [147]K. K.Sheppard, Instructions for Creating an Atlas DLL for Matlab 6.x in Windows NT/2000/XP, http://www.kevinsheppard.org/research/matlabatlas/,2008.4.
    [148]The AAM API, http://www2.imm.dtu.dk/-aam/aamapi/,2010.12.
    [149]J. Zhu, S. C. Hoi, E. Yau and M. R. Lyu, "Automatic 3D face modeling using 2D active appearance models", in Proc. of the 13th Pacific Conf. on Computer Graphics and Applications, PG 2005, Macao, China, pp.133-135, Oct.2005.
    [150]S. W. Park, J. Heo and M. Savvides, "3D face reconstruction from a single 2D face image", Proc. of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Anchorage, USA, pp.1-8, Jun.2008.
    [151]A. Gee and R. Cipolla, "Fast visual tracking by temporal consensus", Image and vision Computing, vol.14, pp.105-114,1996.
    [152]A. Yilmaz, M. Shah, "Automatic feature detection and pose recovery for faces", in Proceeding of the 5th Asian Conference on Computer Vision, Melbourne, pp. 284-289,2002.
    [153]Y. Mao, C. Y. Suen, C. Sun and C. Feng, "Pose estimation based on two images from different Views", IEEE Workshop on Applications of Computer Vision (WACV'07), Austin, pp.1-9,2007.
    [154]刘晓宁,周明全,耿国华,基于单张二维照片的三维姿态计算,计算机工程,3,2006：232-233。
    [155]P. Salama, N. B. Shroff, and E. J. Delp, "Error concealment in encoded video streams", In Signal Recovery Techniques for Image and Video Compression and Transmission, Kluwer Academinc Publishers,1998.
    [156]D. Agrafiotis, D. R. Bull, and C. N. canagarajah, "Enhanced error concealment with mode selection", IEEE Transactions on Circuits and Systems for Video Technology, vol.16, no.8, pp.960-973, Aug.2006.
    [157]R. Zhang, Y. Zhou, and X. Huang, "Content-adaptive spatial error concealment for video communication", IEEE Transactions on Consumer Electronics, vol.50, no.1, pp.335-341, Jan.2004.
    [158]彭强,张庆明,徐锦亮,王宁，基于边缘检测的空域自适应差错掩盖算法,铁道学报,30(6)：2008：58-62。

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700