多媒体语义提取方法及其在视频水印中的应用研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着计算机和网络技术的飞速发展,视频和图像等多媒体数据呈几何级数增长,同时人们对这些视觉媒体内容的需求也越来越多,越来越广泛,因此如何从浩如烟海的数据资源中实现信息检索逐渐成为目前的研究热点。但是现有的检索技术多是基于底层视觉特征的检索,与人们所能理解的高层语义概念相去甚远,这严重地影响检索的实际效果。多媒体数据所包含的语义内容无法用底层视觉特征来准确表述,即在底层视觉特征和包含的语义之间存在着“语义鸿沟(Semantic Gap)”,如何跨越“语义鸿沟”,有效的提取语义信息,已经成为多媒体研究领域中一个亟待解决的问题。
     首先,论文阐述了基于内容的信息检索技术(Content Based Information Retrieval,CBIR)的研究和发展现状,介绍了语义提取研究的相关理论及当前常用的语义提取方法,包括基于机器学习的语义提取方法、基于反馈学习的语义提取方法和结合特定领域的语义提取方法等。论文研究并实现了两种典型的基于机器学习的图像语义提取方法,包括基于支持向量机(Support Vector Machine,SVM)的语义提取和基于一致语言模型(Coherent Language Model,CLM)的语义提取。实验结果表明,这两种方法对图像有较好的语义提取效果。
     其次,论文提出了一种基于模糊关联分类的视频语义提取方法,该方法引入模糊概念,解决了关联规则挖掘“边界过硬”问题;把关联分类规则挖掘看作约束优化问题,通过构造自适应惩罚亲和度函数,以提高评估抗体优劣程度的准确性;采用混合双变异算子,以获得更好的全局和局部搜索能力;采用老化算子,在保证种群多样性的同时减小了计算复杂度。论文将该方法用于视频运动语义和纹理语义提取,取得了令人满意的实验结果。
     最后,论文将高层语义应用到视频数字水印中,提出了一种基于视频语义的AVS(Audio Video coding standard)压缩域鲁棒水印方法,该方法利用获得的视频运动语义,在线生成动态语义水印;根据运动语义自适应确定感兴趣镜头,根据纹理语义自适应确定感兴趣I帧,根据人眼视觉掩蔽特性,选择运动剧烈和运动缓慢区域作为感兴趣区域,将水印嵌入在感兴趣I帧的亮度子块预测残差DCT中频系数上;利用视频纹理特征,自适应控制水印嵌入强度。实验和分析表明,该方法不仅对各种常规攻击鲁棒,而且对帧重组、帧内裁剪和帧删除等视频特有攻击表现出强的鲁棒性。
     论文最后对本文工作进行了总结,并提出了下一步研究探索的方向。
With rapid development of computer and network technology, video and image multimedia data are into a geometric growth. And an urgent demand has arisen for this multimedia information. How to retrieval useful data from abroad array of resources in the multimedia data is becoming the current research focus. But most information retrieval techniques are based on low-level features, which are quite different from the semantic concepts in human thought, affecting the retrieval results inevitably. The low-level features cannot describe the semantic content of multimedia data exactly. That is, there are“semantic gap”between low-level features and high-level semantics. How to solve the“semantic gap”, extract semantics information effectively has become a serious problem in multimedia research.
     Firstly, the paper describes the development and the state of the content-based information retrieval (CBIR) research, introduces the relevant theory of the semantics and the current common methods of the semantics extraction, including semantic extraction methods based on machine learning, feedback learning and domain-specific. The paper researches and implements two kinds of typical machine learning based methods of image, which are semantics extraction based on Support Vector Machine (SVM) and semantics extraction based on Coherent Language Model (CLM). The experimental results showed that the two methods have better performance on image semantics extraction.
     Secondly, the paper proposes a method of video semantic extraction based on fuzzy associative classification, which introduces fuzzy concept to solve the“boundary tough”issues of association rule mining. This method considers the associative classification rule mining as a constrained optimization problem, improves the accuracy of assessment of antibody by constructing adaptive penalty affinity function. The proposed method adopts a mixed pairs of mutation operator to obtain better global and local search capabilities, and adopts an aging operator to ensure the population diversity and reduce the computational complexity at the same time. The paper applies this method to extract video motion semantics and texture semantics, obtaining satisfactory results.
     Finally, the paper combines the watermark with video semantics, proposes an Audio Video coding standard (AVS) compressed domain robust watermarking method based on video semantics. By use of video semantics extracted, the method generates dynamic semantics watermark online. This method determines the shots of interest according to motion semantic adaptively, determines the I frames of interest according to texture semantic adaptively, and selects the intense movement and slow movement of the regional as a region of interest according to human visual masking properties, then embeds watermark in the IF DCT coefficients of the luminance sub-block prediction residual of the I frames of interest. This method adaptively controls watermark embedding strength by use of video texture features. Experiments and analysis show that this method is robust not only to various conventional attacks, but also to re-frame, frame cropping, frame deletion and other video-specific attacks.
     In the end, the summary of this paper is given and the future direction of the research is presented.
引文
[1] BimboA. Visual lnformation Retrieval.Morgan Kaufmann, Inc, 1999.
    [2] QBIC. http://www.qbic.almaden.ibm.com/
    [3] Yirage. http://www.virage.am/cgi-bin/query-e/
    [4] Shih Fu Chang, William Chen, Horace J. Meng, Hari Sundaram, Di Zhong. An Automated Content Based Video Search System Using Visual Cues. Proceedings of ACM Multimedia Conference. 1997: 435-442.
    [5] J. R. Smith, S. F. Chang. a fully automated content-based image query system. Proceedings of ACM Multimedia Conference. 1997: 87-98.
    [6] WebSEEK. http://persia.Ee.Columbia.edu:8008/
    [7] Ciocca, G., I. Fagliardi, et al., An integrated multimedia system. Journal of Visual Languages and Computing, Special issue on Querying Optical Engineering. 2001, 4315: 384-392.
    [8] Yong Rui, Thomas S. Huang, Sharad Mehrotra, and Michael Ortega. Automatic matching tool selection using relevance feedback in MARS. Proceedings of 2nd Internet conference on Visual Information Systems. 1997: 109-116.
    [9] M. Stricker and M. Orengo. Similarity of color images. SPIE Storage Retrieval Image Video Databases III. 1995, 2185:381-392.
    [10] H. Tamura, S. Mori, T. Yamawaki. Texture features corresponding to visual perception. IEEE Transacations on System, Man and Cybernetics, 1978, 8(6): 460-473.
    [11] Rui, Y., A. C. She, et al., Modified Fourier descriptors for shape representation-A practical approach. Proceedings of 1st Iternational Workshop on Image Databases and Multimedia Search, 1996: 456-461.
    [12] Hu, M.-K. Visual pattern recognition by moment invariant. IEEE Transaction on Information Theory. 1962, 8(2): 179-187.
    [13]王惠锋,孙正兴, et al.,语义图像检索研究进展.计算机研究与发展, 2002,39(5):513-521.
    [14] Eakins, J. R, M. E. Graham, Content based image retrieval. A report to the JISC Technology Application Programme. 1999.
    [15] Luo, J., A. E. Savakis, et al., A Bayesian network-based framework for semanticimage understanding. Pattern Recognition. 2005, 38: 919-934.
    [16] Aksoy, S., K. Koperski, et al., Learning Bayesian classifiers for scene classification with a visual grammar. IEEE Transactions on Geoscience and Remote Sensing. 2005, 43(3): 581-589.
    [17] Cusano C, Ciocca G, Schettini R. Image annotation using SVM. Proceedings of SPIE. 2004, 5304: 330-338
    [18]施智平,李清勇,史俊等.集成视觉特征和语义信息的相关反馈方法.计算机辅助设计与图形学学报. 2007, 19(9): 1138-1142.
    [19] Djordjevic D, Izquierdo E. An object and user-driven system for semantic-based image annotation and retrieval. IEEE Transactions on Circuits and Systems for Video Technology. 2007, 17(3): 313-323.
    [20]王上飞,陈恩红,汪祖媛等.基于支持向量机的图像情感语义注释和检索算法的研究.模式识别与人工智能. 2004, 17(1): 27-33.
    [21] Li Y K, Bretschneider T R. Semantic-sensitive satellite image retrieval. IEEE Transactions on Geoscience and Remote Sensing. 2007, 45(4): 853-860.
    [22] Ogiela M R, Tadeusiewicz R . Nonlinear processing and semantic content analysis in medical imaging-a cognitive approach. IEEE Transactions on Instrumentation and Measurement. 2005, 54(6): 2149-2155.
    [23] Barb A S, Shyu C-R, Sethi Y P. Knowledge representation and sharing using visual semantic modeling for diagnostic medical image databases. IEEE Transactions on Information Technology in Biomedicine. 2005, 9(4): 538-553.
    [24] Xin-Jing Wang, Lei Zhang, Xirong Li and Wei-Ying Ma. Annotating Images by Mining Image Search Results.IEEE Transactions on Pattern Analysis and Machine Intelligence. 2008, 30(11): 1919-1932.
    [25] Jianping Fan, Yuli Gao, and Hangzai Luo. Integrating Concept Ontology and Multitask Learning to Achieve More Effective Classifier Training for MultilevelImage Annotation. IEEE Transactions on Image Processing. 2008, 17(3):407-426.
    [26] Xiaohong Hu, Xu Qian, Xinming Ma, Ziqiang Wang. A Novel Region-based Image Annotation Using Multi-instance Learning. International Workshop on Knowledge Discovery and Data Mining. 2009, 602-605.
    [27]曹莉华,柳伟,李国辉.基于多种主色调的图像检索算法研究与实现.计算机研究与发展. 1999, 36(1): 96-100.
    [28] Mori Y, Takahashi H, Oka R. Image-to-word transformation based on dividing and vector quantizing images with words. 2007:12-19.
    [29] Barnard K, Duygulu P, Forsyth D, et al. Matching words and pictures. Journal of Machine Learning Research. 2003, 3(2): 1107-1135.
    [30] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. Journal of Machine Learning Research. 2003, 3(1): 993-1022.
    [31] Monay F, Gatica-Perez D. Modeling semantic aspect s for cross2media image indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2007, 29(10): 1802-1817.
    [32] Jeon J, Lavrenko V, Manmatha R. Automatic image annotation and retrieval using cross-media relevance models. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2003: 119-126.
    [33] Lavrenko V, Manmatha R, J eon J. A model for learning the semantics of pictures. Proceedings of the 17th International Conference on Neural Information Processing. 2003: 553-560.
    [34] Feng S L, Manmatha R, Lavrenko V. Multiple Bernoulli relevance models for image and video annotation. Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2004: 1002-1009.
    [35] Jin R, Chai J Y, Si L. Effective automatic image annotation via a coherent language model and active learning. Proceedings of the 12th ACM International Conference on Multimedia. 2004: 892-899.
    [36] R. C. Gonzalez and R. E. Woods, Digital Image Processing. Englewood Cliffs, NJ: Prentice-Hall, 2002.
    [37] NaPhade M R, Smith J R, et al., IBM Research TRECVID-2005 Video Retrieval System TRECVID. 2005.
    [38] Chang S F,Hsu W,etal.Columbia University TRECVID 2005 Video Search and High-level Feature Extraction.TRECVID. 2005.
    [39] Vincent S. Tseng, Ja-Hwung Su, Jhih-Hong Huang, and Chih-Jen Chen. Integrated Mining of Visual Features, Speech Features, and Frequent Patternsfor Semantic Video Annotation. IEEE Transactions on Multimedia. 2008, 10(2): 260-267.
    [40] Jialie Shen, Dacheng Tao and Xuelong Li. Modality Mixture Projections for Semantic Video Event Detection. IEEE Transactions on Circuits and Systems for Video Technology. 2008, 18(11):1587-1596.
    [41] Cees G. M. Snoek, Bouke Huurnink, et al., Adding Semantics to Detectors for Video Retrieval. IEEE Transactions on Multimedia. 2007, 9(5):975-986.
    [42] Weiming Hu, Dan Xie, Zhouyu Fu, Wenrong Zeng, and Steve Maybank.Semantic-Based Surveillance Video Retrieval. IEEE Transactions on Image Processing. 2007, 16(4):1168-1181.
    [43] Claudio Piciarelli, Christian Micheloni and Gian Luca Foresti. Trajectory-Based Anomalous Event Detection. IEEE Transactions on Circuits and Systems for Video Technology. 2008, 18(11):1544-1554.
    [44] Jinjun Wang, Engsiong Chng, Changsheng Xu, Hanqinq Lu and Qi Tian. Generation of Personalized Music Sports Video Using Multimodal Cues. IEEE Transactions on Multimedia. 2007, 9(3):576-588.
    [45] Jinhui Tang, Xian-Sheng Hua, Guo-Jun Qi, Yan Song, and Xiuqing Wu. Video Annotation Based on Kernel Linear Neighborhood Propagation. IEEE Transsctions on Multimedia. 2008, 10(4):620-628.
    [46] R. Agrawal and R. Srikant, Fast algorithms for mining association rules, Proceedings of Interent Conference on Very Large Databases. 1994:487-499.
    [47] J. Han, J. Pei and Y. Yin, Mining frequent patterns without candidate generation, Proceedings of ACM Interent Conference on Management of Data. 2000:1-12.
    [48] Tien Dung Do, Siu Cheung Hui and A. C. M. Fong, et al., Associative Classification With Artificial Immune System. IEEE Transactions on Evolutionary Computation. 2009, 2(13):217-228.
    [49] Lin Liang, Guanghua Xu, Dan Liu. Shuanfeng Zhao. Immune Clonal Selection Optimization Method with Mixed Mutation Strategies. Second International Conference on Bio-Inspired Computing: Theories and Applications. 2007:37-41.
    [50] Alper Koz, A. Aydin Alatan. Oblivious Spatio-Temporal Watermarking of Digital Video by Exploiting the Human Visual System. IEEE Transactions on Circuits and Systems for Video Technology. 2008, 18(3):326-337.
    [51] Karen Su, Deepa Kundur, and Dimitrios Hatzinakos. Spatially Localized Image Dependent Watermarking for Statistical Invisibility and Collusion Resistance. IEEE Transactions on Multimedia. 2005, 7(1):52-66.
    [52] Paraskevi Tzouveli, Klimis Ntalianis, Stefanos Kollias, Human Face Watermarking based on Zernike Moments. IEEE International Symposium on Signal Processing and Information Technology. 2005:399-404.
    [53] Cong Jin, Yan Chao and Xiao-Liang Zhang. Semi-Fragile Watermark Based on Adaptive Quantization for Image Content Authentication. International Conference on E-Business and Information System Security. 2009:1– 5.
    [54]数字音视频编解码技术标准工作组,信息技术先进音视频编码第2部分:视频(GB\T20090.2-2006),北京:中国标准出版社, 2006.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700