面向结构化数据的视频检索研究

英文题名：Research on Video Retrieval with Structral Data
作者：顾志伟
论文级别：博士
学科专业名称：信号与信息处理
中文关键词：基于内容的视频检索 ; 语义分析 ; Adaboost-SVM ; 多示例主动学习 ; 多层次多示例学习 ; 核方法 ; 场景检测
英文关键词：Content-based video retrieval ; semantic analysis ; adaboost-SVM ; multi-instance active learning ; multi-layer multi-instance learning ; kernel method ; scene segmentation
学位年度：2008
导师：吴秀清
学科代码：081002
学位授予单位：中国科学技术大学
论文提交日期：2008-05-01

摘要

视频数据在近几年呈现出爆炸式的增长,在人们的日常生活中占据越来越重要的地位,而视频分享在未来数年甚至数十年也都将会是热点,这使得视频内容分析以及视频检索成为当前视频研究领域的重点。基于内容的视频检索(CBVR)是一项集理论性、实用性和挑战性为一体的技术,经过十几年的研究,取得了巨大的进展,已经有一些原型系统开发出来,并在小型商用搜索引擎中使用。在CBVR中,广义的视频结构化起着非常关键的作用。由于原始视频为无结构的数据流,在检索时首先需要采用合适的模型将视频组织为结构化数据,并根据结构化的组织形式对视频进行分析、索引与查询。本文的主要工作目标是研究视频的数据结构化特性,并充分利用其结构特性设计高效的机器学习算法用于高层语义理解,能够自动地或以较少的人工参与缩小底层特征与高层语义之间的“语义鸿沟”,最终改善视频检索的性能。
     本文以视频的结构为主线,分别从图像层次结构、镜头层次结构和场景层次结构进行研究,提出在这些层次结构下的机器学习算法。本文的主要工作和创新点总结如下:
     1.针对基于全局信息的图像层检索,提出采用AdaBoost方法与SVM相结合进行多次样本抽样,将分类精度作为特征性能的判据进行特征选择,选取少量有利于检索的特征,将弱分类器增强为强分类器,从而较好的融合多种特征。
     2.对基于区域信息的图像层检索,采用多示例学习进行建模,并利用多示例主动学习以减少人工标注的工作量,解决标注样本缺乏问题。文中详细分析多示例主动学习的特点,归纳为包层、示例层和混合层次三种主动学习模式;针对包层多示例主动学习问题,提出一种结合示例数目统计特征和不确定性的样本选择策略,实验验证了该方法的有效性。
     3.镜头是视频的基本物理单元,因此视频检索通常都是在镜头级别进行。本文分析视频本身所具有的多层次结构特性,首次提出多层次多示例学习框架,该框架结合了结构学习和多示例学习的特点,能对视频内容有效建模。文中探讨多层次多示例学习需要解决的关键问题,并针对这些问题设计多个算法构成一个完整的框架。本文首先设计多层次多示例核来度量这种特定结构下样本的相似度;然后利用边缘化核的思想对多层次多示例核进行改进得到边缘化多层次多示例核,解决示例贡献的权重问题;继而提出多层次多示例正则化框架,引入多重约束显式地表达多层次结构和多示例关系特性,最终较好地解决了多层次多示例学习问题。
     4.场景是视频中的语义单元,比镜头具有更高的抽象和概括能力,在视频语义理解时有效地结合场景信息将对视频检索、管理等语义级应用提供支持。本文提出一种将全局分布特性和局部相似性约束结合的基于能量最小化的方法进行场景分割(EMS);同时,提出一种将场景分割结果与自动语音识别(ASR)结果融合的方法用于视频检索中,得到更加优异的性能。
In recent years, the amounts of video data have surged to an unprecedented level, videos play more and more important role in our daily life, and internet video sharing will still be remarkable in the next several years (even decades). As a result, video content analysis and video retrieval are becoming central issues in video research. Content-based video retrieval (CBVR) is a theoretical, practical and challenging technique, it has made tremendous progress in the past several years, and some prototype systems have been developed for small commercial search engines. Generalized video structuring plays a key role in CBVR, however, the raw video data is unstructured, in the first step, it needs to be organized as structural data using appropriate models, and then perform video analysis, indexing and querying on the basis of the organized structure. The objective of this thesis is to research on the structural characteristics in video content, and further design efficient machine learning algorithms for high-level semantic understanding by using such structural characteristics. These machine learning algorithms attempt to narrow the "semantic gap" between low-level feature and high-level semantic automatically or with few manual laboring, and ultimately improve the retrieval performance.
     In this thesis, we take the hierarchical structure as the clue to analysis the semantics in video content. We propose appropriate algorithms with the hierarchical structure, i.e. image-level, shot-level and scene-level structures. The main contribution are summarized as follows,
     1. For image-level retrieval based on global information, we propose to process multiple sampling by integrating AdaBoost and SVM, and select a few helpful features taking classification accuracy as criterion of feature, meanwhile boost the weak classifiers to a strong classifier.
     2. For image-level retrieval based on regional information, we model the image structure with multiple-instance learning which belongs to structural learning framework, and introduce multiple instance active learning (MIAL) to reduce manual labeling and solve the problem of lacking labeled-samples. We analysis the characteristics of MIAL, and categorize it into three paradigms, i.e. bag-level, instance-level and mixture-level active learrling. For bag-level MIAL, we propose a sample selection strategy which takes the statistics of instance number as an important measure, and combines with the uncertainty of samples. The experimental results demonstrate the effectiveness of the proposed algorithm.
     3. As shot is the basic physical unit of video, video retrieval is usually adopted at shot-level. We study the intrinsic hierarchical structure information of the video content, and propose the multi-layer multi-instance (MLMI) learning framework, which is the combination of structural learning and multiple instance learning, has the ability of modeling the video content in natural sense. We discuss the problems should be solved in multi-layer multi-instance learning, and designed a complete framework composed of several algorithms for these problems. Firstly, a MLMI kernel is constructed to measure the similarity of the special structure. To weight the instance contributions, we further utilize marginalize method and propose the marginalized MLMI kernel. To deal with the ambiguity propagation problem which is introduced by weak labeling and multi-layer structure, we then propose a regularization framework which takes several explicit constraints into consideration, i.e. hyper-bag prediction error, sub-layer prediction error, inter-layer inconsistency measure, and classifier complexity, and the MLMI learning problem is finally solved preferably.
     4. Scene is regarded as the basic semantic unit in video, it is more abstract and recapitulative than shot, thus employing the scene information in semantic understanding could be beneficial for the semantic level applications, such as video retrieval, management, etc. We propose an energy minimization based scene segmentation (EMS) algorithm in which not only the global distribution of time and content, but also the local temporal continuity are taken into account simultaneously. Moreover, a scheme of fusing scene segmentation and automatic speech recognition (ASR) results is proposed and adopted in video retrieval.

引文

[1]Online Video:Seeing the Whole Picture,http://www.emarketer.com/Article.aspx?id=1005256
    [2]2006年中国视频搜索研究报告,http://www.iResearch.com.cn
    [3]R.Baeza-Yates,and B.Ribeiro,Modern Information Retrieval.Addison-Wesley,ACM Press:1999.
    [4]A.D.Bimbo,Visual Information Retrieval,Morgan Kaufmann,San Francisco,CA,1999.
    [5]M.S.Lew,Principles of Visual Information Retrieval,Springer Verlag,2001.
    [6]D.Feng,W.C.Siu,and H.J.Zhang,Multimedia Information Retrieval and Management:Technological Fundamentals and Applications.Berlin:Springer,2003.
    [7]B.Shah,V.Raghavan,and P.Dhatric,Efficient and effective content-based image retrieval using space transformation.Proceedings of the 10th International Multimedia Modelling Conference,pp.279-284,2004.
    [8]C.Djeraba,Association and content-based retrieval.IEEE Transactions on Knowledge and Data Engineering,vol.15(1),pp.118-135,2003.
    [9]P Howarth,A Yavlinsky,D-Heesch and S Ruer,Medical Image Retrieval-using Texture,Locality and Colour.Lecture Notes from the Cross Language Evaluation Forum,pp 740-749,Springer LNCS 3491,2005.
    [10]http://www.virage.com/cgi-bin/query-e.
    [11]http://vrw.excalib.com.
    [12]http://www.ctr.columbia.edu/Visual Seek.
    [13]http://www.ctr.columbia.edu/videoq.
    [14]http://www-db.ics.uci.edu/pages/research/mars.html
    [15]http://www.informedia.cs.cmu.edu
    [16]M.S.LEW,Next generation Web searches for visual content.IEEE Comput.pp.46-53,2000.
    [17]J.LI and J.Z.WANG,Automatic linguistic indexing of pictures by a statistical modeling approach.IEEE Trans.Patt.Analy.Machine Intell,vol.25(9),pp.1075-1088,2003.
    [18]L.Xie,P.Xu,S.-F.Chang,A.Divakaran,and H.Sun.Structure analysis of soc-cer video with domain knowledge and hidden markov models.Pattern Recognition Letters,vol.25(7),pp.767-775,2004.
    [19]D.Zhong and S.-F.Chang.Structure analysis of sports video using domain models.In Proceedings of International Conference on Multimedia and Expo,pp.713-716,2001.
    [20]J.He,M.Li,H.-J.Zhang,H.Tong;and C.Zhang.Manifold-ranking based image retrieval.In ACM Multimedia,2004.
    [21]J.He,M.Li,H.-J.Zhang,H.Tong,and C.Zhang.Generalized manifold-ranking based image retrieval.IEEE Transaction on Image Processing,vol.15(10),2006.
    [22]M.Wang,X.-S.Hua,Y.Song,X.Yuan,S.Li,and H.-J.Zhang.Automatic video annotation by semi-supervised learning with kernel density estimation.In Proc.ACM Multimedia,2006.
    [23]Y.Song,X.-S.Hua,L.Dai,and M.Wang.Semi-automatic video annotation based on active learning with multiple complementary predictors.In ACM International Workshop on Multimedia Information Retrieval,2005.
    [24]Y.Song,X.-S.Hua,L.Dai,M.Wang,and H.-J.Zhang.Efficient semantic annotation method for indexing large personal video database.In ACM International Workshop on Multimedia Information Retrieval,2005.
    [25]M.Wang,X.-S.Hua,X.Yuan,Y.Song,and L.-R.Dai.Optimizing multi-graph learning:Towards a uni~-ed video annotation scheme.In ACM Multimedia,2007.
    [26]J.Fan,A.Elmagarmid,X.Zhu,W.Aref,and L.Wu.Classview:hierarchical video shot classi~-cation,indexing,and accessing.IEEE Trans Multimedia,vol.1(6),pp.70-86,2004.
    [27]A.Oliva,A.Torralba,Modeling the shape of the scene,A holistic representation of the spatial envelope.Int.J.Comput.Vision,vol.42,pp.145-175,2001.
    [28]A.Yavlinsky,E.Schofield,S.Ruger,Automated Image Annotation Using Global Features and Robust Nonparametric Density Estimation.In Image and Video Retrieval.Springer,vol.3568,pp.507-517,Singapore,2005.
    [29]C.Dorai,S.Venkatesh,Bridging the semantic gap in content management systems:computational media aesthetics.In Proc.Conf.on Computational Semiotics for Games and New Media,Amsterdam,pp.94-99,2001.
    [30]X.Liu,Y.Zhang and Y.Pan,Webscope-CBVR:a customized content based search engine for video on WWW,SPIE,vol.3974,San Jose,CA,2000.
    [31]Jian Huang,Li Zhao,and Shiqiang Yang,TVFind(TM):an MPEG-7-based video management system over Internet,In Proceedings of,SPIE Storage and Retrieval for Media Databases,vol.4676,pp.336-346,2001.
    [32]Y.Rui,T.Huang,and S.Mehrotra.Constructing table-of-content for videos.ACM Journal of Multimedia Systems,vol.7(5),1999.
    [33]J.Boreczky and L.Rowe,Comparison of video shot boundary detection techniques.In Proceedings of SPIE Storage and Retrieval for Image and Video Databases,1996.
    [34]R.Lienhart.Comparison of automatic shot boundary detection algorithm.In Proceedings of SPIE Storage and Retrieval,for Image and Video Databases,1999.
    [35] A. Hanjalic. Shot-boundary detection: Unraveled and resolved? IEEE Transactionson Circuits and Systems for Video Technology, vol. 12(2), 2002.
    [36] H.-W. Kang, X.-S. Hua. To Learn Representativeness of Video Frames. In Proceedings of ACM Multimedia, Singapore, 2005.
    [37] S. Belongie, C. Carson, H. Greenspan, and J. Malik, Color-and Texture-Based Image Segmentation Using EM and Its Application to Content-Based Image Retrieval, IEEE Proceedings of ICCV, pp. 675-682,1998.
    [38] D. K. Panjwani and G. Healey, Markov Random Field Models for Unsupervised segmentation of Textured Color Images, IEEE Trans. PAMI, vol.17(10), pp. 939-954,1995.
    [39] J. Shi and J. Malik, Normalized Cuts and Image Segmentation, IEEE Trans. PAMI, vol.22(8),pp. 888-905,2000.
    [40] C. Fowlkes and J. Malik, How Much Does Globalization Help Segmentation, UC Berkeley Technical Report, No. UCB/CSD-04-1340,2004.
    [41] M. Stricker, and M. Orengo, Similarity of Color Images, Proceedings of SPIE Storage and Retrieval for Image and Video Databases, 1995
    [42] J.R. Smith, and S.F. Chang, Tools and Techniques for Color Image Retrieval, Proceedings of SPIE Storage and Retrieval for Image and Video Databases, 1995.
    [43] G Pass, R. Zabih, and J. Miller, Comparing Images Using Color Coherence Vectors,Proceedings of ACM International Conference on Multimedia, Boston, MA, pp. 65-73,1996.
    [44] J. Huang, S. Kumar, M. Mitra, W.J: ZhU, and R. Zabin, Image Indexing Using Color Correlogram, Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, 1997.
    [45] S.F. Chang, A. Puri, T. Sikora, and H.J. Zhang, Introduction to the Special Issue on MPEG-7,IEEE Transactions on Circuits and Systems for Video Technology, vol. 11(6), 2001.
    [46] J.M. Martinez, Overview of the MPEG-7 Standard (v8.0), ISO/IEC JTC1/SC29/WG11, N4980,July 2002.

    [47] B.S. Manjunath, J.R. Ohm, V.V. Vasudevan, and A. Yamada, Color and Texture Descriptors, IEEE Transactions On Circuits and Systems for Video Technology, vol. 11(6), pp. 703-715,2001.
    [48] R.M. Haralick, K. Shanmngam, and I. Dihstein; Texture Feature for Image Classification, IEEE Transactions on Systems, Man and Cybernetics, vol. 3(6), pp. 610-621,1973.
    [49] H.T. Nguyen, A. Smeulders, Active learning using pre-clustering. Proceedings of International Conference on Machine Learning, Canada, 2004.
    [50] A.L. Ratan, O. Maron, W.E.L. Grimson, and T. Lozano-Perez. A framework for learning query concepts in image classification. In Proceedings of IEEE Conference Computer Vision and Pattern Recognition, pp. 423-429,1999.

    [51] C. Yang, T. Lozano-Perez. Image database retrieval with multiple-instance learning techniques. In Proceedings of the 16th International Conference on Data Engineering, San Diego, CA, pp.233-243,2000.

    [52] D. Angluin. Queries and concept learning. Machine Learning, vol. 2 (3), pp. 319-342,1988.
    [53] H.S. Seung, M. Opper, H. Sompolinsky. Query by committee. In Proceedings of the 5th Workshop on Computational Learning Theory, San Mateo, CA, pp. 287-294,1992.
    [54] D.D. Lewis and W.A. Gale. A sequential algorithm for training text classifiers. In Proceedings of ACM SIGIR Conference R&D in Information Retrieval, Dublin, Ireland, pp. 3-12, 1994.
    [55] I.J. Cox, M. Miller, T.P. Minka, P. Yianilos. An optimized interaction strategy for Bayesian relevance feedback. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Santa Barbara, CA, pp. 553-558,1998.
    [56] S. Tong and E. Chang. Support vector machine active learning for image retrieval. In Proceedings of ACM Multimedia, Ottawa, Canada, 2001.
    [57] R. Rahmani, S.A. Goldman. MISSL: Multiple-instance semi-supervised learning. Proceedings of International Conference on Machine Learning, Pittsburgh, PA, vol. 23, pp. 705-712,2006.
    [58] D. Zhang, Z. Shi, Y. Song, and C. Zhang. Localized Content-Based Image Retrieval Using Semi-Supervised Multiple Instance Learning. Proceedings of Asian Conference on Computer Vision, Japan, pp. 18-22,2007.
    [59] B. Settles, M. Craven, and S. Ray. Multiple-Instance Active Learning. Proceedings of Advances in Neural Information Processing Systems, vol. 20,2007.
    [60] TRECVID: TREC Video Retrieval Evaluation, http://www-nlpir.nist.gov/projects/trecvid
    [61] V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, NY, USA, 2th edition,2000.

    [62] A. Ghbshal, P. Arcing, S. Khudanpur, Hidden Markov Models for Automatic Annotation and Content-Based Retrieval of Images and Video, Proc. ACM Conf. on Research & Development on Information Retrieval, 2005.

    [63] A. Amir, M. Berg, S.-F. Chang, W. Hsu, G Iyengar, C.-Y. Lin, M. Naphade, A. Natsev, C. Neti,H. Nock, J. Smith, B. Tseng, Y.Wu, and D. Zhang, IBM research TRECVID-2003 video retrieval system, in Proc. TRECVID Workshop, Gaithersburg, MD, 2003.
    [64] J. Yuan et.al. THU and ICRC at TRECVID 2007, roc. NIST TRECVID Workshop, 2007.
    [65] Tao Mei, Xian-Sheng Hua, Wei Lai, Linjuri Yang, et al. MSRA-USTC-SJTU at TRECVID 2007: High-Level Feature Extraction and Search, TREC Video Retrieval Evaluation Online Proceedings(TRECVID),2007.
    [66]J.Tang,X.-S.Hua,G.-J.Qi,M.Wang,T.Mei,X.Wu,Structure-Sensitive Manifold Ranking for Video Concept Detection,Proc.ACM Multimedia,Augsburg,Germany,Sep.pp.23-29,2007.
    [67]R.Yan,J.Tesic,and J.R.Smith.Model shared subspace boosting for multi-label classification.In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining,2007.
    [68]C.G M.Snoek,M.Worring,and A.W.Smeulders.Early versus late fusion in semantic video analysis.In ACM Multimedia,2005.
    [69]R.Yan,M.Naphade,Semi-supervised CrossFeature Learning for Semantic Concept Detection in Videos,Proc.IEEE Conf.on Computer Vision and Pattern Recognition,2005.
    [70]Y.-G.Jiang,C.-W.Ngo,and J.Yang,Towards optimal bag-of-features for object categorization and semantic video retrieval,in ACM CIVR,2007.
    [71]T.G.Dietterich,R.H.Lathrop,and T.Lozano-Perez,Solving the Multiple Instance Problem with Axis-Parallel Rectangles,Artificial Intelligence,vol.89,nos.1-2,pp.31-71,1997.
    [72]O.Maron and A.L.Ratan,Multiple-Instance Learning for Natural Scene Classification,Proc.15th Int'1 Conf.Machine Learning,pp.341-349,1998.
    [73]S.Andrews,I.Tsochantaridis,and T.Hofmann,Support Vector Machines for Multiple-Instance Learning,Advances in Neural Information Processing Systems 15,pp.561-568,2003.
    [74]Y.Chert and J.Z.Wang,Image Categorization by Learning and Reasoning with Regions,J.Machine Learning Research,vol.5,pp.913-939,2004.
    [75]Y.Chen,J.Bi,and J.Wang.MILES:Multiple-instance learning via embedded instance selection.IEEE Trans.Pattern Analysis Machihe Intelligence,vol.28(12),pp.1931-1947,2006.
    [76]T.Gartner,A.Flach,A.Kowalczyk,and A.J.Smola,Multi-Instance Kernels,Proc.19th Int'l Conf.Machine Learning,pp.179-186,2002.
    [77]J.T.Kwok,P.-M.Cheung.Marginalized multi-instance kernels:Proc.20th Int'l Joint Conf.on Artificial Intelligence,pp.901-906,India,January 2007.
    [78]M.Naphade and J.Smith.A generalized multiple instance learning algorithm for large scale modeling of multimedia semantics.Proc.IEEE Int'l Conf.on ACoustics,Speech and Signal Processing,Philadelphia,PA,May 2005.
    [79]S.L.Feng,R.Manmatha,V.Lavrenko,Multiple Bernoulli Relevance Models for Image and Video Annotation,Proc.IEEE Conf.on Computer Vision and Pattern Recognition,2004
    [80]http://www.m-w.com
    [81] Z. Rasheed and M. Shah, Scene Detection in Hollywood Movies and TV Shows, In Proceedings of CVPR, pp. 343-350,2003.

    [82] L. Zhao, W. Qi, Y.-J. Wang, S.-Q. Yang and H.-J. Zhang, Video Shot Grouping using Best First Model Merging, In Proceedings of Storage and Retrieval for Media Database, pp. 262-269,2001.
    [83] M. Yeung, B. Yeo, and B. Liu, Segmentation of Videos by Clustering and Graph Analysis, Computer Vision and Image Understanding, vol. 71(1), pp. 94-109,1998.
    [84] Z. Rasheed and M. Shah, Detection and Representation of Scenes in Videos, IEEE Trans. on Multimedia, vol. 7(6), pp. 1097-1105, Dec. 2005.
    [85] Y. Zhai and M. Shah, A General Framework for Temporal Video Scene Segmentation, In Proceedings of ICCV, pp. 1111-1116,2005.
    [86] Y.-P. Tan and H. Lu, Model-based clustering and analysis of video scenes, In Proceedings of ICIP, pp. 617-620,2002.

    [87] Z. Gu, T. Mei, X.-S. Hua, X. Wu, S. Li. EMS: Energy Minimization Based Video Scene Segmentation, In Proceedings of IEEE International Conference on Multimedia & Expo, pp.520-523, Beijing, China, July 2007.
    [88] H.-J. Zhang, A. Kankanhalli, and S. W. Smoliar, Automatic partitioning of full-motion video, Multimedia Systems, vol. 1(1), pp. 10-28; Jun. 1993
    [89] Z. Li, B. Wang, M. Li, and W.-Y. Ma, A Probabilistic Model for Retrospective News Event Detection, In Proceedings of SIGIR, 2005.

    [90] T. Mei, B. Wang, X.-S. Hua, et al, Probabilistic Multimodality Fusion for Event Based Home Photo Clustering, In Proceedings of International Conference on Multimedia & Expo, pp.1757-1760,2006.
    [91] Y. Boykov, O. Veksler, R. Zabih, Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell., vol. 23(11), pp. 1222-1239, Nov. 2001.

    [92] S. Geman and D. Geman, Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 6, pp.721-741, 1984.
    [93] J.S. Yedidia, W.T. Freeman, Y. Weiss, Generalized belief propagation. In: NIPS. Pp. 689-695,2000.
    [94] J. Besag, On the statistical analysis of dirty pictures, J. Royal Statist. Soc. B, vol. 48(3), pp.259-302, 1986.

    [95] J. Goldstein, M. Kantrowitz, V. Mittal, and J. Carbonell. Summarizing text documents: Sentence selection and evaluation metrics. In Proceedings of the ACM SIGIR, pp. 121-128, 1999.
    [96]J.Liu,W.Lai,X.-S.Hua,and S.Li,Video Search Re-Ranking via Multi-Graph Propagation,In Proceedings of ACM International Conference on Multimedia,pp.24-29 September,2007.
    [97]T.Mei,X.-S.Hua,W.Lai,L.Yang,et al.MSRA-USTC-SJTU at TRECVID 2007:High-Level Feature Extraction and Search,TREC Video Retrieval Evaluation Online Proceedings (TRECVID),2007.
    [98]A.F.Smeaton,P.Over,and W.Kraaij.Evaluation campaigns and TRECVid.In Proc.of ACM Int'l Workshop on Multimedia Information Retrieval,pp.321-330,2006.
    [99]T.Gaertner,J.Lloyd,and P.Flach.Kernels and distances for structured data.Machine Learning,2004
    [100]T.Hofmann,B.Scholkopf and A.J.Smola,A Review of Kernel Methods in Machine Learning.2006
    [101]Y.Altun,D.McAllester,and M.Belidn.Maximum margin semi-supervised learning for structured variables.Advances in Neural Information Processing Systems vol.18,pp.33-40.MIT Press,Cambridge,MA,2006.
    [102]T.Gaertner.A survey of kernels for structured data.SIGKDD Explorations,2003.
    [103]M.Collins and N.Duffy.Convolution kernels for natural language.Advances in Neural Information Processing Systems,vol.14,MIT Press,2002.
    [104]H.Kashima,K.Tsuda,and A.Inokuchi.Marginalized kernels between labeled graphs.Proc.20th Int'l Conf.on Machine Learning,2003.
    [105]D.Haussler,Convolution kernels on discrete structures,UC Santa Cruz,Tech.Rep.UCSC-CRL-99-10,July 1999.
    [106]K.Tsuda,T.Kin,K.Asai.Marginalized kernels for biological sequences.Bioinformatics,vol.18,pp.268-275,2002.
    [107]P.Mahe,N.Ueda,T.Akutsu,J.L.Perret,and J.P.Vert.Extension of marginalized graph kernels.In proceedings of the Twenty-First International Conference on Machine Learning,pp.552-559,Banff,Alberta,Canada,July,2004
    [108]N.Cristianini,J.Shawe-Taylor,A.Elisseeff,and J.Kandola.On kernel-target alignment.In Advances in Neural Information Processing Systems 14,Cambridge,MA,2002,MIT Press.
    [109]X.Xu and E.Frank.Logistic regression and boosting for labeled bags of instances.In Proc 8th PacifiC-Asia Conference on Knowledge Discovery and Data Mining,Sydney,Australia,pp.272-281.Springer,2004.
    [110]P.M.Cheung and J.T.Kwok.A regularization framework for multiple-instance learning.In Proceedings of the Twenty-Third International Conference on Machine Learning,pp.193-200, Pittsburgh,USA,June 2006.
    [111]A.J.Smola,S.V.N.Vishwanathan,and T.Hofmann,Kernel methods for missing variables.Proc 10th Int'l Workshop on Artificial Intelligence and Statistics.Barbados,2005.
    [112]Y.Deng,and B.S.Manjunath,Unsupervised segmentation of color-texture regions in images and video,IEEE Trans.Pattern Analysis and Machine Intelligence,vol.23(8),pp.800-810,Aug.2001.
    [113]A.F.Smeaton,P.Over,and W.Kraaij,Evaluation campaigns and TRECVid,in Proceedings of the 8th ACM Int'l Workshop on Multimedia Information Retrieval,pp.321-330,2006.
    [114]A.P.Kadir,D.Ajay,Framework for Measurement of the Intensity of Motion Activity of Video Segments,Journal of Visual Communications and Image Representation,vol.14(4),December 2003.
    [115]M.Naphade,J.R.Smith,J.Tesic,S.-F.Chang,W.Hsu,L.Kennedy,A.Hauptmann,and J.Curtis.Large-scale concept ontology for multimedia.IEEE Trans.on MultiMedia,vol.13(3),pp.86-91,2006.
    [116]Z.Gu,T.Mei,X.-S.Hua,J.Tang,and X.Wu.Multi-layer multi-instance kernel for video concept detection.In Proc.of ACM Multimedia,Augsburg,Germany,September 2007.
    [117]Z.Gu,T.Mei,X.-S.Hua,J.Tang,and X.Wu.MILC~2:A Multi-Layer Multi-Instance Learning Approach to Video Concept Detection.Int'1 Conf.on Multimedia Modeling,2008.
    [118]A.Yuille and A.Rangarajan.The concaveconvex procedure.Neural Computation,vol.15,pp.915-936,2003.
    [119]D.T.Pham and A.L.Hoai.A.D.C.6ptimization algorithm for solving the trust-region subproblem.SIAM Journal on Optimization,vol.8,pp.476-505,1998.
    [120]H.Tamura,S.Mori,and T.Yamawaki,Texture Features Corresponding to Visual Perception,IEEE Transactions on Systems,Man and Cybernetics,vol.8(6),pp.460-473,1978.
    [121]J.R.Smith and S.F.Chang,Automated Binary Texture Feature Sets for Image Retrieval,Proceedings of IEEE International Conference on Acoustics,Speech,and Signal Processing,May 1996.
    [122]W.Y.Ma and B.S.Manjunath,A Comparison of Wavelet Transform Features for Texture Image Annotation,Proceedings of IEEE International Conference on Image Processing,1995.
    [123]Canny J.A computational approach to edge detection.IEEE T ransactions on Pattern A nalysis and M ach ine Intelligence,vol.18(8),pp.679-698,1986.
    [124]Y.Rui,A.C.She,and T.S.Huang,ModifiedFourier Descriptors for Shape Representation-A Practical Approach,Proceedings of First International Workshop on Image Databases and Multi Media Search,1996.
    [125]M.K.Hu,Visual Pattern Recognition by Moment Invariants,IRE Transactions Information Theory,vol.8(2),pp.179-187,1962.
    [126]A.Pentland,R.W.Picard,and S.Sclaroff,Photobook:Content-based Manipulation of Image Databases,International Journal of Computer Vision,1996.
    [127]E.M.Arkin,L.Chew,D.Huttenlocher,K.Kedem,and J.Mitchell,An Effciently Computable Metric for Comparing Polygonal Shapes,IEEE Transactions on Pattern Analysis and Machine Intelligence,vol.13(3),March 1991.
    [128]G..C.H.Chuang and C.C.J.Kuo,Wavelet Descriptor of Planar Curves:Theory and Applications,IEEE Transactions on Image Processing,vol.5(1),pp.56-70,January 1996.
    [129]边肇祺,张学工.模式识别.第2版.北京:清华大学出版社,2000.
    [130]Richard O.Duda等著,李宏东,姚天翔译.模式分类.北京:机械工业出版社.2003.
    [131]T.Mei,X.-S.Hua,H.-Q.Zhou,S.Li.Modeling and Mining of Users'Capture Intention for Home Videos,IEEE Transactions on Multimedia,Vol.9(1),pp.66-77,Jan.2007.
    [132]Suresh K.Choubey,Vijay V.Raghavan.Generic and fully automatic content-based image retrieval using color[J],Pattern Recognition Letters 18_1997.1233-1240
    [133]Dengsheng Zhang,Guojun Lu,Content-based image retrieval using Gabor texture features[C],In Proc.Of First IEEE Pacific-rim Conference on Multimedia(PCM"00),Fargo,ND,USA,June 1-3,2001,pp.1-9,
    [134]W.Niblack et al.The QBIC Project:Querying Images By Content Using Color,Texture and Shape[C].SPIE Conf.On Storage and Retrieval for Image and Video Databases,San Jose,CA,1993,Vol.1908,pp.173-187.
    [135]http://www.qbic.almaden.ibm.com/.
    [136]Y.Freund and R.Schapire,A Decision-Theoretic Generalization of On-Line Learning and an Application To Boosting[J],J.Computer and System Sciences,,Aug.1997,vol.55,no.1,pp.119-139.
    [137]顾志伟,吴秀清,荆浩,尹东,王艺元.“一种基于特征选择的医学图像检索方法”,中国生物医学工程学报,2007,1.
    [138]顾志伟,吴秀清,“多示例主动学习”,中国科学技术大学学报,已投稿
    [139]Zhiwei Gu,Tao Mei,Xian-Sheng Hua,Jinhui Tang,Xiuqing Wu,"Multi-layer Multi-Instance Kernel for Video Concept Detection",ACM Multimedia,Augsburg,Germany,Sep.23-29,2007.
    [140]Zhiwei Gu,Tao Mei,Xian-Sheng Hua,Jinhui Tang,Xiuqing Wu,"MILC2:A Multi-Layer Multi-Instance Learning Approach to Video Concept Detection",International Multimedia Modeling Conference(MMM),2008;
    [141]Zhiwei Gu,Tao Mei,Xian-Sheng Hua,Jinhui Tang,Xiuqing Wu."Multi-Layer Multi-Instance Learning for Video Concept Detection",IEEE Transaction on Multimedia,revised.
    [142]Zhenjun Zha,Zhiwei Gu,Tan Mei,Zengfu Wang,Xian-Sheng Hua."Marginalized Multi-Layer Multi-Instance Kernel for Video Concept Detection",International Conference on Pattern Recognition,submitted.
    [143]Zhiwei Gu,Tao Mei,Xiansheng Hua,Xiuqing Wu,Shipeng Li,"EMS:Energy Minimization Based Video Scene Segmentation",IEEE International Conference on Multimedia & Expo,Beijing,China,July,2007;
    [144]Tao Mei,Xian.-Sheng Hua,Wei Lai,Linjun Yang,Zhen-Jun Zha,Yuan Liu,Zhiwei Gu,Guo-Jun Qi,Meng Wang,Jinhui Tang,Xun Yuan,Zheng Lu,Jingjing Liu,"MSRA-USTC-SJTU at TRECVID 2007:High-Level Feature Extraction and Search," TREC Video Retrieval Evaluation Online Proceedings(TRECVID),2007.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700