基于机器学习方法的视觉信息标注研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于机器学习方法的视觉信息标注研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Visual Information Annotation with Machine Learning Techniques
作者：查正军
论文级别：博士
学科专业名称：模式识别与智能系统
中文关键词：视觉信息标注 ; 视觉信息检索 ; 机器学习 ; 多语义概念学习 ; 半监督学习 ; 多示例学习 ; 语义概念特性
英文关键词：Visual Information Annotation ; Visual Information Retrieval ; Machine Learning ; Mutli-Concept Learning ; Semi-Supervised Learning ; Multi-Instance Learning ; Concept Property
学位年度：2009
导师：汪增福
学科代码：081104
学位授予单位：中国科学技术大学
论文提交日期：2009-05-01

摘要

随着存储设备、计算机网络和压缩技术的发展,视觉信息大量涌现,如何有效地组织、表达、管理和检索浩如烟海的视觉信息,已成为科研领域和工业界亟待研究解决的问题.其中,视觉信息语义标注受到愈来愈多的关注,成为当下的研究热点。
     早期的视觉信息标注是人工完成的,然而人工标注费时费力,无法完成大规模视觉信息的语义标注,这促使人们寻找新的标注技术。由于机器学习方法具有成熟的理论基础,可为语义标注提供理论支持及可能的解决方案,基于机器学习的自动语义标注已逐渐成为解决视觉信息标注问题的主流途径.本文主要针对基于机器学习的视觉信息标注展开研究,提出了一系列新颖的标注算法,期望通过挖掘视觉信息标注的特性来提高视觉信息标注的准确性,以促进其实用化进程。本论文的主要研究工作如下:
     1.提出了面向语义概念特性挖掘的视觉信息标注框架。在传统的视频标注方法中引入语义概念特性挖掘,提出了结合概念间统计相关性以及语义相关性的视觉信息标注改善算法。传统的视觉信息标注方法将某语义概念的标注问题当作两类分类问题来解决,将语义概念简单地视为类别标号,忽略了语义概念的自身特性,如概念间的统计相关性、语义相关性等,从而难以获得令人满意的效果。本文通过挖掘语义概念特性,并以此指导视觉信息标注,有效地提高了标注的准确性。
     2.提出了全新的基于半监督多语义概念学习的视觉信息标注技术。将多语义概念学习引入到半监督学习中,提出了半监督多语义概念学习框架。基于此框架,提出了两种新颖的半监督多语义概念学习算法,有机地结合了样本间的相似性、语义概念间的相关性、以及样本与概念间的映射关系。基于半监督多语义概念学习的视觉信息标注技术在克服了训练样本缺乏问题的同时,充分挖掘了概念间的相关性,获得了更为准确的标注模型。
     3.首创性地研究了基于多示例多语义概念学习的视觉信息标注技术.多示例学习作为消除数据歧义性的有效途径,己被越来越多地应用于视觉信息标注。但是,以往的多示例学习方法局限于解决单语义概念学习问题.而视觉信息标注本质上是一个多语义概念学习问题,并且数据歧义的起因也正是这种多语义性。多示例多语义概念学习技术从全新的角度对视觉信息标注进行研究,将多语义概念学习的思想引入到多示例学习中,通过挖掘概念间的相互联系更好地消除了视觉数据的歧义,从而有效地提高了标注的准确性。
     视觉信息标注研究,涉及到机器学习、计算机视觉以及认知科学等多个领域,希望本文的研究工作,也能为相关领域提供一些新的思路与方法。
With the advances in storage devices,networks,and compression techniques,large-scale visual data become available to more and more ordinary users.How to effectively organize, represent,manage and retrieve these data becomes a challenging task in both research and industry.To achieve this goal,visual information annotation has attracted more and more attention.
     The most intuitive approach to accomplishing this task is manual annotation.However, manual annotation is a labor-intensive and time-consuming process,and it can hardly be applied for large-scale data set or concept set.Thus,learning-based visual information annotation becomes an alternative method,in this thesis,we propose several learning-based visual information annotation methods,which aim to obtain accurate annotation results automatically.The main contributions are illustrated as follows:
     1.We propose a novel visual information annotation framework that accomplishs annotation with discovering concepts' properties.Based on the framework,two new annotation refinement algorithms are developed,which aim to improve the annotation by leveraging statistical correlation and semantic correlation among the concepts,respectively. Compare with conventional annotation methods that treat concepts independently,our approaches can achieve superior performance on visual information annotation.
     2.Semi-supervised learning methods,which attempt to trackle training data insufficiency problem,are widely adopted for visual infroamtion annotation.Conventional semi-supervised learning methods predominantly foucs on single concept learning problem.However,visual information annotation is essentially a multi-concept learning task.In this paper,we propose an innovative semi-supervised multi-concept learning framework.This framework is characterized by siumultaneously exploiting the inherent correlation among multiple concepts and the annotaiton consistency over the sample graph.Based on the proposed framework,we further develop two novel semi-supervised multi-concept leanring algorithms.We apply them to visual information annotation and obtain superior performance compared to the state-of-the-art semi-supervised approaches.
     3.Recently,multi-instance learning technique,which attemps to trackle data ambiguity problem,has been utilized for visual information annotation.Traditional multi-instance learning algorithms mainly focus on single concept learning problem.However,visual information annotation is essentially a multi-concept learning problem.In this paper,we propose an innovative multi-instance multi-concept learning method which simultaneously captures both the connections between semantic concepts and regions,as well as the correlation among multiple concepts.Moreover,the proposed approach is also able to capture other dependencies among the concepts,such as the spatial relations. We apply the propose approach to image annotation and report superior performance comparped to key existing methods.
     Visual information annotation is closely related with many different domains,such as machine learning,computer vision and cognitive science.We also hope that our work can provide several inspirations or methods for these communities.

引文

[1]章毓晋著,基于内容的视觉信息检索,科学出版社,2003.
    [2]章毓晋著,图像处理和分析.清华大学出版社,1999.
    [3]章毓晋著,图像理解与计算机视觉,清华大学出版社,2000.
    [4]庄越挺等著,网上多媒体信息分析与检索.清华大学出版社,2002.
    [5]A.G.Hauptmann,Lessons for the future from a decade of informedia video analysis research,in Proceedings of ACM International Conf.Image and Video Retrieval,2005.
    [6]A.G.Hauptmann,R.Yah,W.-H.Lin,M.Christel,and H.Wactlar,Can high-level concepts fill the semantic gap in video retrieval? A case study with broadcast news,IEEE transactions on Multimedia,vol.9,no.5,2007.
    [7]X.Li,D.Wang,J.Li,and B.Zhang,Video search in concept space:a text-like paradigm,in Proceedings of ACM International Conference on image and Video Retrieval,2007.
    [8]X.Mu,Content-based video retrieval:does video's semantic visual feature matter?,in Proceedings ofACM SIGIR Conference,2006.
    [9]D.DeMenthon,V.Kobla,D.Doermann,Video summarization by curve simplification,in Proceedings of International Conference on Computer Vision and Pattern Recognition,1998.
    [10]Y.Gong,X.Liu,Video summarization and retrieval using singular value decomposition,ACM Multimedia Systems Journal,vol.9,no.2,2003.
    [11]Z.Li,G.M.Schuster,A.K.Katsaggelos,MINMAX optimal video summarization,IEEE Transactions on Circuits and Systems for Video Technology,vol.15,no.10,2005.
    [12]C.-W.Ngo,Y.-F.Ma,H.-J.Zhang,Video summarization and scene detection by graph modeling,IEEE Transactions on Circuits and Systems for Video Technology,vol.15,no.2,2005.
    [13]X.Zhu,J.Fan,A.K.Elmagarmid,X.Wu,Hierarchical video content description and summarization using unified semantic and visual similarity,ACM Multimedia Systems Journal,vol.9,2003.
    [14]C.Lin,B.Tseng,and J.Smith,VideoAnnEx:IBM MPEG-7 annotation tool for multimedia indexing and concept learning,in Proceedings of International Conference on Multimedia &Expo,2003.
    [15]L.Xie,P.Xu,S.-F.Chang,A.Divakaran,and H.Sun,Structure analysis of soccer video with domain knowledge and Hidden Markov Models,Pattern Recognition Letters,vol.25,no.7,2004.
    [16]A.Ekin,A.M.Tekalp,Automatic soccer video analysis and summarization,IEEE Transactions on Image Processing,vol.12,no.7,2003.
    [17]P.Xu,L.Xie,S.-F.Chang,Algorithms and system for segmentation and structure analysis in soccer video,in Proceedings of IEEE International Conference Multimedia & Expo,2001.
    [18]W.Niblack,R.Barber,W.Equitz,and M.Flickner,The QBIC project:querying image by content using color,texture and shape,in Proceedings of SPIE Storage and Retrieval for Image and Video Databases,1994.
    [19]J.Jeon,V.Lavrenko,and R.Manmatha,Automatic image annotation and retrieval using cross-media relevance models,in Proceedings of the 26~(th)annual ACM S1GIR conference on Research and development in information retrieval,2003.
    [20]P.Duygulu,K.Barnard,N.Fretias,and D.Forsyth,Object recognition as machine translation:Learning a lexicaon fro a fixed image vocabulary,in Proceedings of European Conference onComputer Vision,2002.
    [21]J.Li and J.Z.Wang,Real-time computerized annotation of pictures,in Proceedings of ACM Conference on Multimedia,2006.
    [22]R.Maree,P.Geurts,J.Piater,and L.Wehenkel,Random subwindows for robust image classification,in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2006.
    [23]L.Xie,S.-F.Chang,A.Divakaran,and H.Sun,Structure analysis of soccer video with hidden markov models,in Proceedings of International Conference on Acoustics,Speech,and Signal Processing,2002.
    [24]D.Zhong and S.-F.Chang,Structure analysis of sports video using domain models,in Proceedings of International Conference on Multimedia & Expo,2001.
    [25]G.Sudihir,J.C.M.Lee,and A.K.Jain,Automatic classification of tennis video for high-level content-based retrieval,in Proceedings of International Workshop on Content-Based Access of Image and Video Databases,1998.
    [26]J.Fan,H.Luo,and X.Li,Semantic video classification by integrating flexible mixture model with adaptive EM algorithm,in Proceedings of ACM International Workshop on Multimedia Information Retrieval,2003.
    [27]J.Wu,X.-S.Hua,and H.-J.Zhang,An online-optimized incremental learning framework for video semantic classification,in Proceedings of ACM Multimedia,2004.
    [28]J.Liu,M.Li,Q.Liu,H.Lu,and S.Ma,Image annotation via graph learning,Pattern Recognition,vol.42,no.2,2009.
    [29]H.Tang,J.He,M.Li,C.Zhang,and W.-Y.Ma,Graph-based multi-modality learning,in Proceedings of ACM International conference on Multimedial,2005.
    [30]M.Wang,X.-S.Hua,X.Yuan,Y.Song,and L.-R.Dai,“Optimizing multi-graph learning:Towards a unified video annotation scheme”,in Proceedings of ACM International conference on Multimedial,2007.
    [31]J.Tang,X.-S.Hua,G.-J.Qi,M.Wang,T.Mei,and X.Wu,Structure-sensitive manifold ranking for video concept detection,in Proceedings of ACM International conference on Multimedial,2007.
    [32]J.Tang,X.-S.Hua,G.-J.Qi,Y.Song,and X.Wu,Video annotation based on kernel linear neighborhood propagation,IEEE Transactions on Multimedia,vol.10,no.4,2008.
    [33]ImageCLEF:The CLEF Cross Language Image Retrieval Track:http://imageclef.org.
    [34]TRECVID:TREC Video Retrieval Evaluation,http://www-nlpir.nist.gov/proiects/trecvid
    [35]TREC-10 Proceedings appendix on common evaluation measures,http://trec.nist.gov/pubs/treclQ/appendices/measures.pdf.
    [36]Y.Rui,T.Huang,and S.Mehrotra.Constructing table-of-content for videos.ACM Journal of Multimedia Systems,vol.7(5),1999.
    [37]J.Boreczky and L.Rowe,Comparison of video shot boundary detection techniques.In Proceedings of SPIE Storage and Retrieval for Image and Video Databases,1996.
    [38]R.Lienhart.Comparison of automatic shot boundary detection algorithm.In Proceedings of SPIE Storage and Retrieval for Image and Video Databases,1999.
    [39]A.Hanjalic.Shot-boundary detection:Unraveled and resolved? IEEE Transactionson Circuits and Systems for Video Technology,vol.12(2),2002.
    [40]Kim,J.-G,Chang,H.S.,Kim,J.,and Kim,H.M.Efficient camera motion characterization for MPEG video indexing,in Proceedings of International Conference on Multimedia & Expo,2000.
    [41]Y.F.Ma,L.Lu,H.-J.Zhang,and M.Li,A user attention model for video summarization,in Proceedings of ACM Conference on Multimedia,2002.
    [42]A.Hanjalic and H.-J.Zhang,An integrated scheme for automated video abstraction based on unsupervised cluster-validaty analysis,IEEE Transactions on Circuits and Systems for Video Technology,vol.9,no.8,1999.
    [43]Z.Rasheed and M.Shah,Scene Detection in Hollywood Movies and TV Shows,In Proceedings of CVPR,pp.343-350,2003.
    [44]L.Zhao,W.Qi,Y.-J.Wang,S.-Q.Yang and H.-J.Zhang,Video Shot Grouping using Best First Model Merging,In Proceedings of Storage and Retrieval for Media Database,pp.262-269,2001.
    [45]M.Yeung,B.Yeo,and B.Liu,Segmentation of Videos by Clustering and Graph Analysis,Computer Vision and Image Understanding,vol.71(1),pp.94-109,1998.
    [46]Z.Rasheed and M.Shah,Detection and Representation of Scenes in Videos,IEEE Trans,on Multimedia,vol.7(6),pp.1097-1105,Dec.2005.
    [47]Y.Zhai and M.Shah,A General Framework for Temporal Video Scene Segmentation,In Proceedings of ICCV,pp.1111-1116,2005.
    [48]Y.-P.Tan and H.Lu,Model-based clustering and analysis of video scenes,In Proceedings of ICIP,pp.617-620,2002.
    [49]M.Strieker,and M.Orengo,Similarity of Color Images,Proceedings of SPIE Storage and Retrieval for Image and Video Databases,1995.
    [50]S.F.Chang,A.Puri,T.Sikora,and H.J.Zhang,Introduction to the Special Issue on MPEG-7,IEEE Transactions on Circuits and Systems for Video Technology,vol.11(6),2001.
    [51]J.M.Martinez,Overview of the MPEG-7 Standard (v8.0),ISO/IEC JTC1/SC29/WG11,N4980,July 2002.
    [52]B.S.Manjunath,J.R.Ohm,V.V.Vasudevan,and A.Yamada,Color and Texture Descriptors,IEEE Transactions On Circuits and Systems for Video Technology,vol.11(6),pp.703-715,2001.
    [53]B.S.Manjunath,J.R.Ohm,V.V.Vasudevan,and A.Yamada,Color and Texture Descriptors,IEEE Transactions On Circuits and Systems for Video Technology,vol.11(6),pp.703-715,2001.
    [54]R.M.Haralick,K.Shanmngam,and I.Dinstein,Texture Feature for Image Classification,IEEE Transactions on Systems,Man and Cybernetics,vol.3(6),pp.610-621,1973.
    [55]H.Tamura,S.Mori,and T.Yamawaki,Texture Features Corresponding to Visual Perception,IEEE Transactions on Systems,Man and Cybernetics,vol.8(6),pp.460-473,1978.
    [56]J.R.Smith and S.F.Chang,Automated Binary Texture Feature Sets for Image Retrieval,Proceedings of IEEE International Conference on Acoustics,Speech,and Signal Processing,May 1996.
    [57]W.Y.Ma and B.S.Manjunath,A Comparison of Wavelet Transform Features for Texture Image Annotation,Proceedings of IEEE International Conference on Image Processing,1995.
    [58]Canny J.A computational approach to edge detection.IEEE T ransactions on Pattern A nalysis and M ach ine Intelligence,vol.18(8),pp.679-698,1986.
    [59]A.Pentland,R.W.Picard,and S.Sclaroff,Photobook:Content-based Manipulation of Image Databases,International Journal of Computer Vision,1996.
    [60]E.M.Arkin,L.Chew,D.Huttenlocher,K.Kedem,and J.Mitchell,An Effciently Computable Metric for Comparing Polygonal Shapes,IEEE Transactions on Pattern Analysis and Machine Intelligence,vol.13(3),March 1991.
    [61]G.C.H.Chuang and C.C.J.Kuo,Wavelet Descriptor of Planar Curves:Theory and Applications,IEEE Transactions on Image Processing,vol.5(1),pp.56-70,January 1996.
    [62]V.Vapnik.The Nature of Statistical Learning Theory.Springer-Verlag,NY,USA,2th edition,2000.
    [63]M.Naphade,J.R.Smith,J.Tesic,S.-F.Chang,W.Hsu,L.Kennedy,A.Hauptmann,and J.Curtis,Large-scale concept ontology for multimedia,IEEE Multimedia,vol.13,no.3,2006.
    [64]C.G.M.Snoek,M.Worring,J.C.van Gemert,J.-M.Geusebroek,and A.W.M.Smeulders,The challenge problem for automated detection of 101 semantic concepts in multimedia,in Proceedings of ACM Conference on Multimedia,2006
    [65]J.Jeon,and R.Manmatha,Using maximum entropy for automatic image annotation,in Proceedings of International Conference on Image and Video Retrieval,2004
    [66]J.Tang,X.-S.Hua,G.-J.Qi,M.Wang,T.Mei and X.Wu,Structure-sensitive manifold ranking for video concept detection,in Proceedings of ACM Conference on Multimedia,2007
    [67]Y.Wu,B.L.Tseng,and J.R.Smith,Ontology-based multi-classification learning for video concept detecion,in Proceedings of International Conference on Multimedia and Expo,2004
    [68]J.R.Smith,M.Naphade,and A.Natsev,Multimedia semantic indexing using model vectors,in Proceedings of International Conference on Multimedia and Expo,2003
    [69]W.Jiang,S.-F.Chang,and A.C.Loui,Context-based concept fusion with boosted conditional random fields,in Proceedings of International Conference on Acoustics,Speech,and Signal Processing,2007
    [70]Z.-J.Zha,T.Mei,X.-S.Hua,G.-J.Qi,and Z.Wang.Refining video annotation by exploiting pairwise concurrent relation,in Proceedings of ACM Conference on Multimedia,2007
    [71]C.Wang,F.Jing,L.Zhang,and H.-J.Zhang,Image annotation refinement using random walk with restrats,in Proceedings of ACM Conference on Multimedia,2006
    [72]Y.Y.Yao,Entropy measures,maximum entropy principle,and emerging applications,Information theoretic measures for knowledge discovery and data mining,2003.
    [73]S.Russell and P.Norving,Artificial intelligence:a modern approach,Englewood Cliffs Prentice Hall,1995
    [74]F.V.Fensen,An introduction to Bayesian network,New York Springer Verlag,1996
    [75]T.K.Moon,The expectation-maximization algorithm,Signal Processing,vol.13,no.6,1996
    [76]G.Xuan,W.Zhang,and P.Chai,EM algorithm of Gaussian mixture model and hidden Markov model,in Proceedings of International Conference on Image Processing,2002
    [77]M.Jordan,Learning in graphical model,MIT Press,1998
    [78]LSCOM annotation:http://www.ee.columbia.edu/ln/dvmm/lscom/
    [79]A.F.Smeaton,P.Over,and W.Kraaij,Evaluation campaigns and TRECVID,in Proceedings of the ACM SIGMM International Workshop on Multimedia Information Retrieval,2006.
    [80]Columbia 374:http://www.ee.columbia.edu/ln/dvmm/columbia374/
    [81]Z.-J.Zha,T.Mei,Z.Wang,and X.-S.Hua,Building a comprehensive ontology to refine video concept detection,in Proceedings of the ACM SIGMM International Workshop on Multimedia Information Retrieval,2007.
    [82]X.Zhu,Semi-supervised learning literature survey,Technical Report (1530),University of Wisconsin-Madison,2008.
    [83]O.Chapelle,A.Zien and B.Scholkopf,Semi-supervised learning,MIT Press,2006.
    [84]C.Rosenberg,M.Hebert,and H.Schneiderman.Semi-supervised self-training of object detection models,in Proceedings of IEEE Workshop on Applications of Computer Vision,2005
    [85]A.Blum and T.Mitchell,Combining labeled and unlabeled data with co-training,in Proceedings of Workshop on Computational Learning Theory,1998
    [86]V.Castelli and T.Cover,The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter,IEEE transactions on Information Theory,vol.42,1996
    [87]K.Nigam,A.K.McCallum,S.Thrun,and T.Mitchell,Text classification from labeled and unlabeled documents using EM,Machine Learning,vol.39,2000
    [88]M.Belkin,I.Matveeva,and P.Niyogi,Regularization and semi-supervised learning on large graphs,in Proceedings of Annual Conference on Learning Theory,2004
    [89]X.Zhu,Z.Ghahramani,and J.Lafferty,Semi-supervised learning using Gaussian fields and harmonic functions,in Proceedings of International Conf.on Machine Learning,2003.
    [90]D.Zhou,O.Bousquet,T.Lai,J.Weston,and B.Scholkopf,Learning with local and global consistency,in Proceedings of Advances in Neural Information Processing System,2004
    [91]X.Zhu,Semi-supervised learning literature survey,Technical Report (1530),University of Wisconsin-Madison
    [92]J.He,M.Li,H.-J.Zhang,H.Tong and C.Zhang,Manifold-ranking based image retrieval,in Proceedings of ACM Multimedia,2004
    [93]X.Yuan,X.-S.Hua,M.Wang,and X.Wu,Manifold-ranking based video concept detection on large database and feature pool,in Proceedings of ACM Multimedia,2006
    [94]J.Tang,X.-S.Hua,G.Qi,Y.Song,and X.Wu,Kernel based linear neighborhood label propagation for semantic video annotation,in Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining,2007
    [95]D.Y.Hu,L.Reichel,Krylov-subspace methods for the Sylvester equation,Liner algebra and Its applications,1992.
    [96]Z.-J.Zha,T.Mei,J.Wang,Z.Wang,and X.-S.Hua,Graph-based semi-supervised learning with multiple labels,Journal of visual communication and image representation,vol.20,no.2,2008.
    [97]T.G.Dietterich,Lathrop R.H.,T.Lozano-Perez,Solving the multiple-instance problem with axis-parallel rectangles,Artificial Intelligence,vol.89,no.1-2,1997.
    [98]J.Wang and J.-D.Zucker,Solving the multiple-instance problem:a lazy leanring approach,in Proceedings of International Conference on Machine Learning,2000.
    [99]P.M.Long and L.Tan,PAC learning axis-aligned rectangles with respect to product distribution from multiple-instance examples,Machine Learning,vol.30,no.1,1998.
    [100]O.Maron and T.Lozano-Perez,A framework for multiple-instance learnig,Advances in Neural Information Processing Systems,MIT press,1998.
    [101]Q.Zhang and S.A.Goldman,EM-DD:an improved multiple-instance learning technique,Advances in Neural Information Processing Systems,MIT Press,2002.
    [102]Z.-H.Zhou,M.-L.Zhang,K.-J.Chen,A novel bag generator for image database retrieval with multi-instance learning techniques,in Proceedings of International Conference on Tools with Artificial Intelligence,2003.
    [103]Y.Chen and J.Z.Wang,MILES:multiple-instance learning via embedded instance selection,IEEE Transaction on Pattern Analysis and Machine Intelligence,vol.28,no.12,2006.
    [104]T.Gartner,A.Flach,A.Kowalczyk,and A.J.Smola,Multi-Instance Kernels,in Proceedings of International Conference on Machine Learning,2002.
    [105]J.T.Kwok,P.-M.Cheung.Marginalized multi-instance kernels,in Proceedings of International Joint Conference on Artificial Inteligence,2007.
    [106]M.Naphade and J.Smith.A generalized multiple instance learning algorithm for large scale modeling of multimedia semantics,in Proceedings of International Conference on Acoustics,Speech and Signal Processing,2005.
    [107]Z.-H.Zhou,M.-L.Zhang,Multi-insatnce multi-label learning with application to scene classification,Advances in Neural Information Processing Systems,MIT Press,2006.
    [108]S.Kumar and M.Heber,Discriminative random fields:a discriminate framework for contextual interaction in classification,in Proceedings of International Conference on Computer Vision,2005.
    [109]A.Dempster,N.Laird,and D.Rubin,Maximun likelihood from incomplete data via EM algorithm.Journal of Royal Statistical Society,vol.39,no.1,1977.
    [110]S.Kumar and M.Hebert,Discriminative random fields:A discriminative framework for contextual interaction in classification,in Proceedings of International Conference on Computer Vision,2003.
    [111]G.E.Hinton,Training products of experts by minimizing constractive divergence,Neural Computing,vol.14,no.8,2002.
    [112]X.He,R.S.Zemel,and M.A.Carreira-Perinan,Multiscale conditional random fields for image labeling,in Proceedings of International Conference on Computer Vision and Pattern Recognition,2004.
    [113]P.Duygulu,K.Barnard,J.D.Freitans,and D.Forsyth,Object recognition as machine translation:Learning a lexicon for a fixed image vocabulary,in Proceedings of European Conference on Computer Vision,2002.
    [114]J.A.Hanley and B.J.Mcneil,The meaning and use of the area under a receiver operating characteristic(roc)curve,Radiology,vol.143,no.1,1982.
    [115]G.-J.Qi,X.-S.Hua,Y Rui,J.Tang,T.Mei,and H.-J.Zhang,Correlative multi-label video annotation,in Proceedings of ACM Conference on Multimedia,2007.
    [116]Y.Deng and B.S.Manjunath,Unsupervised segmentation of color-texture regions in images and video,IEEE Transaction on Pattern Analysis and Machien Intteligence,vol.23,no.8,2001.
    [117]Y.Cheng and J.Z.Wang,Image categorization by learning and reasoning with region,Journal of Machine Learning Research,vol.5,2004.
    [118]Z.-J.Zha,X.-S.Hua,T.Mei,J.Wang,and Z.Wang,Joint multi-label multi-instance learning for image classification,in Proceedings of International Conference on Computer Vision and Pattern Recognition,2008.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700