多类别模式分类技术及其在多媒体分析上的应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
多媒体自动概念标注是在语义层次上进行视频浏览、搜索的关键技术。这方面的研究经历了两个阶段。第一个阶段使用二值分类算法检测概念集中的每个概念,并达到了一定得准确度。但是这种方法完全忽略了概念类别之间的关系。第二阶段的方法在单独检测单个概念的基础上添加了一个语义融合的步骤来通过挖掘概念之间的关联以此提高标注的准确度。但是这种方法会将第一步的分类错误引入第二步中造成“误差传播”的问题。为了解决上述问题,我们提出一种新的同时对单个概念与底层特征关系以及概念之间关系进行建模的方法,称作关联多类别方法(Correlative Multi-Label,简记CML)。我们在TRECVID数据集上与现有的算法进行了比较,并得到了满意的结果。
     另一方面,一般的主动学习算法可以在样本的维度上动态地构建训练集。尽管这种方法在一般的二值分类问题上取得了满意的结果,然而对于多类别问题而言不是最优的解决方法。我们认为,对于每个选出的样本,仅仅其中的一些有效类别需要被标注,而其它的类别可以通过类别之间的关系推断出来。这是因为考虑到类别的关联性,不同的类别对最小化分类误差的贡献是不同的。因此,我们提出一种通过选择样本-类别对来最小化多类别贝叶斯分类误差界的方法,我们称之为二维主动学习算法,因为它在设计主动学习策略时同时考虑了样本维度和类别维度。进一步,由于训练样本随着时间会不断增加,如果使用基于重训练策略的多类别分类器,会大大增加计算的强度。我们开发了一种高效的在线模型,它能够仅利用新到达的数据即可动态地更新当前的模型,大大提高了算法的效率。我们在两个标准数据集以及一个从Corbis网站上得到的真实数据集来测试上述的算法,并得到令人满意的结果。
Automatically annotating concepts for multimedia is a key to semantic-level video browsing,search and navigation.The research on this topic evolved through two paradigms.The first paradigm used binary classification to detect each individual concept in a concept set.It achieved only limited success,as it did not model the inherent correlation between concepts,e.g.,urban and building.The second paradigm added a second step on top of the individual-concept detectors to fuse multiple concepts.However, its performance varies because the errors incurred in the first detection step can propagate to the second fusion step and therefore degrade the overall performance. To address the above issues,we first propose a third paradigm which simultaneously classifies concepts and models correlations between them in a single step by using a novel Correlative Multi-Label(CML) framework.We compare the performance between the proposed approach and the state-of-the-art approaches in the first and second paradigms on the widely used TRECVID data set.We report superior performance from the proposed approach.
     On the other hand,conventional active learning dynamically constructs the training set only along the sample dimension.While this is the right strategy in binary classification,it is sub-optimal for multi-label image classification.We argue that for each selected sample,only some effective labels need to be annotated while others can be inferred by exploring the label correlations.The reason is the contributions of different labels to minimizing the classification error are different due to the inherent label correlations.To this end,we propose to select sample-label pairs,rather than only samples,to minimize a multi-label Bayesian classification error bound.We call it two-dimensional active learning because it considers both the sample dimension and the label dimension.Furthermore because the number of training samples is increasing rapidly over time due to active learning,it becomes intractable for the offline learner to retrain a new model on the whole training set.So we develop an efficient online learner to adapt the existing model with the new one by minimizing their model distance un- der a set of multi-label constraints.The effectiveness and efficiency of the proposed method are evaluated on two benchmark datasets and a realistic image collection from a real-world image sharing website - Corbis.
引文
[1]唐金辉.视频语义标注的若干问题研究.PhD thesis,中国科学技术大学,2008.
    [2]C.Burges.A tutorial on support vector machines for pattern recognition.Data Mining and Knowledge Discovery,2(2):121-167,1998.
    [3]W.Jiang,S.-F.Chang,and A.C.Loui.Active concept-based concept fusion with partial user labels.In Proceedings of IEEE International Conference on Image Processing,2006.
    [4]G.-J.Qi,X.-S.Hua,Y.Rui,J.Tang,T.Mei,and H.-J.Zhang.Correlative multi-label video annotation.In Proc.of ACM Conference on Multimedia(ACM Multimedia),2007.
    [5]M.R.Naphade,L.Kennedy,J.R.Kender,S.-F.Chang,J.R.Smith,P.Over,and A.Hauptmann.A light scale concept ontology for multimedia understanding for trecvid 2005.Technical report,IBM Research Technical Report,2005.
    [6]X.Tong,Q.Liu,Y.Zhang,and H.Lu.Highlight ranking for sports video browsing.In Proceedings of ACM International Conference on Multimedia,pages 519-522,Singapore,Nov 2005.
    [7]X.Liu,L.Zhang,M.Li,H.-J.Zhang,and D.Wang.Boosting image classification with LDAbased feature combination for digital photograph management.Pattern Recognition,38(6):887-901,Jun 2005.
    [8]J.R.Smith and M.Naphade.Multimedia semantic indexing using model vectors.In Proceeding of IEEE International Conferences on Multimedia and Expo,2003.
    [9]Y.Rui,T.S.Huang,and S.Mehrotra.Constructing table-of-content for videos.ACM Journal of Multimedia Systems,7(5),1999.
    [I0]H.-J.Zhang.Content-Based Video Analysis,Retrieval and Browsing.Book Chapter of Readings in Multimedia Computing and Networking.Academic Press,2002.
    [11]G.-J.Qi,X.-S.Hua,Y.Rui,J.Tang,T.Mei,M.Wang,and H.-J.Zhang.Correlative multilabel video annotation with temporal kernels.ACM Transactions on Multimedia Computing,Communications,and Applications,5(1),October 2008.
    [12]G.Winkler.Image analysis,random fields and dynamic Monte Carlo methods:A mathematical introduction.Springer-Verlag,Berlin,Heidelberg,1995.
    [13]N.Cristianini and J.Shawe-Taylor.An introduction to support vector machines and other kernel-based learning methods.Cambridge University,2000.
    [14]I.Tsochantaridis,T.Hofmann,T.Joachims,and Y.Altun.Support vector machine learning for interdependent and structured output spaces.In Proc.of the 21st International Conference on Machine Learning,2004.
    [15]S.Boyd and L.Vandenberghe.Convex Optimization.Cambridge University Press,2004.
    [16]TRECVID.http://www-nlpir.nist.gov/projects/trecvid/.URL http://www-nlpir.nist.gov/projects/trecvid/.
    [17]M.Campbell and et al.Ibm research trecvid-2006 video retrieval system.In TREC Video Retrieval Evaluation (TRECVID)Proceedings,2006.
    [18]S.-F.Chang and et al.Columbia university trecvid-2006 video search and high-level feature extraction.In TREC Video Retrieval Evaluation (TRECVID)Proceedings,2006.
    [19]A.G.Hauptmann and et al.Multi-lingual broadcast news retrieval.In TREC Video Retrieval Evaluation (TRECVID)Proceedings,2006.
    [20]M.R.Naphade,L.Kennedy,J.R.Kender,S.-F.Chang,J.R.Smith,P.Over,and A.Hauptmann.A light scale concept ontology for multimedia understanding for TRECVID 2005.In IBM Research Report RC236I2 (W0505-104),2005.
    [21]S.Godbole and S.Sarawagi.Discriminative methods for multi-labeled classification.In PAKDD,2004.
    [22]X.-S.Hua,T.Mei,W.Lai,M.Wang,J.Tang,G.-J.Qi,L.Li,and Z.Gu.Microsoft reseach asia trecvid 2006 high-level feature extraction and rushes exploitation.In Online proc.of the TRECVID workshops,2006.
    [23]G.-J.Qi,X.-S.Hua,Y.Rui,J.Tang,and H.-J.Zhang.Two-dimensional multi-label active learning with an efficient online adaptation model for image classification.IEEE Transactions on Pattern Analysis and Machine Intelligence,2008.
    [24]T.Cover and J.Thomas.Elements of information theory,second edition.Wiley Series in Telecommunications,John Wiley and Sons,New York,2006.
    [25]G.-J.Qi,X.-S.Hua,Y.Rui,J.Tang,and H.-J.Zhang.Two dimensional active learning for image classification.In IEEE Proc.of CVPR,2008.
    [26]S.Zhu,X.Ji,W.Xu,and Y.Gong.Multi-labelled classification using maximum entropy method.In Proc.of ACM SIGIR,2005.
    [27]S.F.Chen and R.Rosenfeld.A gaussian prior for smooting maximum entropy models.Tech- nical Report CMU-CS-99-108,School of Computer Science,Carnegie Mellon University,1999.
    [28]A.Kapoor and E.Horvitz.On discarding,caching,and recalling samples in active learning.In Proc.of Uncertainty and Artificial Intelligence,2007.
    [29]J.Wu,X.-S.Hua,and B.Zhang.Tracking concept drifting with gaussian mixture model.In International Conference on Visual Communications and Image Processing,2005.
    [30]D.C.Liu and J.Nocedal.On the limited memory BFGS method for large scale optimization.Mathematical Programming B,45(l-3):503-528,1989.
    [31]N.Syed,H.Liu,and K.Sung.Incremental learning with support vector machines.In Workshop on Support Vector Machines,at the IJCAI,1999.
    [32]G.Cauwenberghs and T.Poggio.Incremental and decremental support vector machine.In Proc.of Neural Information Processing Systems,2000.
    [33]J.Yang,R.Yan,and A.Hauptmann.Cross-domain video concept detection using adaptive svms.In ACM Conference on Multimedia,2007.
    [34]A.P.Dempster,N.M.Laird,and D.B.Rubin.Maximum-likelihood from incomplete data via em algorithm.Journal of the Royal Statistical Society (Series B),39(1),1977.
    [35]R.Neal and G.Hinton.A view of the EM algorithm that justifies incremental,sparse,and other variants.Learning in Graphical Models.Kluwer Academic Press,1998.
    [36]R.M.Neal.Probabilistic inference using markov chain monte carlo methods.Technical Report CRG-TR-93-1,University of Toronto,1993.
    [37]B.J.Frey and D.J.C.MacKay.A revolution:belief propagation in graphs with cycles.In Advances in Neural Information Processing Systems,volume 10.The MIT Press,1998.
    [38]T.Minka.Expectation propagation for approximate bayesian inference.In Proc.of the Seventeenth Conference on Uncertainty in Artificial Intelligence,2001.
    [39]K.P.Murphy,Y.Weiss,and M.I.Jordan.Loopy belief propagation for approximate inference:An empirical study.In Proc.of Conference on Uncertainty in Artificial Intelligence,1999.
    [40]M.R.Boutell,J.Luo,X.Shen,and C.M.Brown.Learning multi-label scene classification.Pattern Recognition,37(9),2004.
    [41]X.Li,L.Wang,and E.Sung.Multi-label svm active learning for image classification.In Proc.of ICIP,2004.
    [42]K.Brinker.On active learning in multi-label classification.“From Data and Information Analysis to Knowledge Engineering ”of Book Series“Studies in Classification,Data Analysis,and Knowledge Organization ”,Springer,2006.
    [43]A.Elisseeff and J.Weston.A kernel method for multi-labelled classification.In Proc.of NIPS,2002.
    [44]G.-J.Qi,Y.Song,X.-S.Hua,L.-R.Dai,and H.-J.Zhang.Video annotation by active learning and cluster tuning.In International Workshop on Semantic Learning Applications in Multimedia,in association with CVPR,2006.
    [45]S.C.H.Hoi and M.R.Lyu.A semi-supervised active learning framework for image retrieval.In Proc.of IEEE CVPR,2005.
    [46]A.Dong and B.Bhanu.Active concept learning for image retrieval in dynamic databases.In IEEE Proc.of ICCV,2003.
    [47]R.Yan,J.Yang,and A.Hauptmann.Automatically labeling data using multi-class active learning.In Proc.of IEEE ICCV,2003.
    [48]S.Tong and E.Y.Chang.Support vector machine active learning for image retrieval.In Proc.of ACM Conference on Multimedia,2001.
    [49]E.Y.Chang,S.Tong,K.Goh,and C.Chang.Support vector machine concept-dependent active learning for image retrieval.IEEE Transaction on Multimedia,2005.
    [50]A.Krause,A.Singh,and C.Guestrin.Near-optimal sensor placements in gaussian processes:Theory,efficient algorithms and empirical studies.Journal of Machine Learning Research,9:235-284,2008.
    [51]A.Kapoor,K.Grauman,R.Urtasun,and T.Darrel.Active learning with gaussian processes for object categorization.In Proc.of IEEE ICCV,2007.
    [52]M.E.Hellman and J.Raviv.Probability of error,equivocation,and the chernoff bound.IEEE Transaction on Information Theory,1970.
    [53]T.Volkmer,J.R.Smith,and A.Natsev.A web-based system for collaborative annotation of large image and video collections.In Proc.of International ACM Conference on Multimedia,2005.
    [54]L.Fei-Fei and P.Perona.A bayesian hierarchical model for learning natural scene categories.In Proc.of IEEE CVPR,2005.
    [55]G.-J.Qi,X.-S.Hua,Y.Rui,J.Tang,Z.-J.Zha,and H.-J.Zhang.A joint appearance-spatial distance for kernel-based image categorization.In IEEE Proc.of CVPR,2008.